Using traps in your scripts

Summary: For scripts to be reasonably robust, one of the conditions that should ideally be met is the ability to clean up any temporary logs or processes left lying around from a forceful termination. Another element to consider is when an interrupt from a user is received, what appropriate action should be taken? With the shell built-in trap command and the logger utility, these can help to provide your scripts with more robustness when a script is forcefully terminated. In this article, I will demonstrate ways trap and logger can be used.

When writing scripts, it is good practice to have a controlled exit from your script; this allows for failed conditions within the script processing. Consider a script that copies or replaces certain files in a file system. You could check if each copy completes successfully before moving on to the next task in the script. If issues occur, then the script exits. This allows the system administrator to inspect where the script failed so that immediate action can be taken to back-out the process or take an alternative action in completing the task.

Listing 1 below contains basic conditional code that could achieve this goal. Using a file copy process as an example, a test is carried out to make sure the file run_pj actually exists. If it does, then a copy is carried out to take a backup of the destination file. If the copy is unsuccessful, then the script exits with a message, detailing the error. If the file is not present, then the script exits, as no more processing should be carried out. If the copy was successful, then the new updated file is copied and overwrites the original file. If this is not successful, then the script exits.

Listing 1. Example_replace

 
#!/bin/bash
#
proj_dir=/opt/pcake/bin
# check file is present
if  [ ! -f "$proj_dir/run_pj" ]
then
 echo " $proj_dir/run_pj not present...exiting"
 exit 1
fi
 # make a backup copy
cp -p $proj_dir/run_pj $proj_dir/run_pj.24042011
if [ $? != 0 ]
then
echo "$proj_dir/run_pj no backup made...exiting"
exit 1
fi
 
# copy  over updated file
if [ ! -f "/opt/dump/rollout/run_pj" ]
 then
  echo "/opt/dump/rollout/run_pj not present...exiting"
  exit 1
fi
cp -p /opt/dump/rollout/run_pj $proj_dir/run_pj
if [ $? != 0 ]
then
echo " $proj_dir/run_pj was not copied..exiting"
exit 1
fi

In this demonstration, I am using bash v3.2. The bash shell can be downloaded from the AIX Toolbox, see the Resources section.

Using the approach in Listing 1, the script exits if there is any error in the copy process, thus not allowing the script to carry on processing if there is an error. Clearly, any error would be fixed before the script is run again.

Another technique to check for errors and exit is to use the set option:

set -e

With the set option: -e, if a command fails (that is, it returns a non-zero exit status), the script exits (unless it is part of a iteration, &&, || command). The example shown in Listing 2 below, copies a non-existent file. The set -e option is used. If the copy command fails, the script exits. Notice that when you run the command, the if statement for the last exit status is never reached because the script exits upon a non-zero return status of the cp command.

Listing 2. Example_fail

#!/bin/bash
set -e
proj_dir=/opt/rollout/v12
# copy a non-existent file
cp $proj_dir/go_sup /usr/local/bin/go_sup
 if [ $? != 0 ]
 then
echo "could not copy $proj_dir/go_sup to /usr/local/bin/"
exit 1
 fi

$ cp_test
cp: /opt/rollout/v12/go_sup: A file or directory in the path name does not exist.

Generating syslog messages

Using the logger command allows the shell and scripts to write messages to the system messages file via the syslogdservice. This can be used within a script to log errors or on completions of your processes so that is viewable by all who interrogate the messages file. Thus keeping you and other system administrators informed of events that have been generated from your scripts.

The most basic format of the command is:

logger -p priority message

Where -p is the priority or facility level contained within syslog.

For example, the following logger command contains the calling script name ("rollout" in this example) with the messagesomething has happened.

logger -p notice "$(basename $0) - something has happened"

The the following output appears in /var/adm/messages:

Apr  5 13:20:30 uk01wrs6008 user:notice dxtans: rollout - something has happened

Getting a signal

The two examples contained in Listing 1 and Listing 2 shows one way that checking post command execution can be carried out. However, what happens if a script gets terminated during its execution? Scripts can be killed or terminated using the signal mechanism (note that not all signals sent are terminal). A signal that is sent to a running process interrupts that process to force some sort of event, typically some action. Signals can come from, but not restricted to:

The kernel or user space via some system event.
The actual process itself via the keyboard (Ctrl-C).
An illegal instruction from within the process.
Another process via another user sending a kill to your process.
Notification via a notification of the state of a required device.

To view the current list of signals, use kill -l (the letter l) command. The list is presented in the form (signal number, signal name):

 $ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGEMT       8) SIGFPE
 9)SIGKILL     10) SIGBUS      11) SIGSEGV     12) SIGSYS
…....
…....

To view the signals and their default actions (on an AIX machine), view the file:

$ cat /usr/include/sys/signal.h|more
…..
…..
#define SIGHUP     1    /* hangup, generated when terminal disconnects */
#define SIGINT     2    /* interrupt, generated from terminal special char */
#define SIGQUIT    3    /* (*) quit, generated from terminal special char */
#define SIGILL     4    /* (*) illegal instruction (not reset when caught)*/
#define SIGTRAP    5    /* (*) trace trap (not reset when caught) */
#define SIGABRT    6    /* (*) abort process */
…..
…..

I have received a signal. Now what?

When a signal has been received by the script, the script can do one of three actions:

Ignore it and do nothing. This is probably what most scripts do without the script authors realising it.
Catch the signal using trap and take appropriate action.
Take the default action.

All the above is true except for the following signals:

SIGKILL (signal 9)

SIGSTOP (signal 17)

SIGCONT (signal 19)

These cannot be caught and always uses the default action. SIGKILL always kills the process. Looking at the listing from the/usr/include/sys/signal.h file, we see the default action for each signal. For instance, SIGINT (signal 2) is an interrupt generated from the terminal; typically, this is the keyboard. Each defined system signal has a different action. There are also two user defined signals: SIGUSR1 (signal 30) and SIGUSR2 (signal 31).

It is up to the author of the script to take what action is required if any, if a signal is received.

These can be used by the script author to do bespoke signals. Be sure to view the signal.h file for all the default actions.

Common signals are:

SIGHUP - hangup or exit a foreground running process from a terminal
SIGINT - Ctrl-C from the keyboard
SIGQUIT - Ctrl-\ from the keyboard
SIGTERM - software termination signal

When receiving a signal, actions that can take place are:

cleaning up files
prompting the users if the script should be actually terminated
ignoring the actual signal
carry on processing

Catching a signal

To catch a signal that is sent to your process, use the built-in trap command. When a signal is caught, the current command being executed attempts to complete before the trap command takes over. If it is a SIGKILL, then termination is immediate. If you ignore certain signals, the default action always take place. For example, if you only trap for SIGINT but do nothing aboutSIGQUIT, then if your process gets a SIGQUIT, the default action takes place (most likely an untidy termination of your script, which you probably do not want).

The format of the trap command is:

trap 'command_list'  signals

Where command_list is a list of commands, which can include a function to run upon receiving a signal contained in the signals list. And, signals is a list of signals to catch or trap.

To ignore a signal, use two single quotes in place of the command_list:

trap ''  signals

To reset a trap use:

trap - signals

Where signals is the signal list.

Lets now look at a bare bones script that catches SIGINT and SIGQUIT. The script contained in Listing 3 below is a counter iteration script. When the user hits Ctrl-C or Ctrl-\ on the keyboard, the trap command traps the signal, and echoes a message that the script has terminated. The termination is accomplished by using the exit command at the end of the command list. If this is not done, the script does not terminate and continues processing. In this example, we want it to terminate. There may be occasions when this would not be the case and processing should continue.

Listing 3. Trap1

#!/bin/bash
# trap1
trap 'echo you hit Ctrl-C/Ctrl-\, now exiting..; exit' SIGINT SIGQUIT
count=0

while :
 do
   sleep 1
   count=$(expr $count + 1)
   echo $count
 done

$ trap1
1
2
3
^Cyou hit Ctrl-C/Ctrl-\, now exiting..

It is considered good form that you use the signal names and not the signal numbers within the trap command. This is for portability reasons across other systems.

You can also use a function in place of the command as demonstrated in Listing 4 below:

Listing 4. Trap1a

#!/bin/bash
# trap1a
trap 'my_exit; exit' SIGINT SIGQUIT
count=0

my_exit()
{
echo "you hit Ctrl-C/Ctrl-\, now exiting.."
 # cleanp commands here if any
}

while :
 do
   sleep 1
   count=$(expr $count + 1)
   echo $count
 done

Signals can also be caught, when a script is running in the background. Listing 5 below, contains a simple counter as in the previous examples. In the following example, I have again chosen to exit the script upon catching the signal. If this was a file processing script, temporary files created would be deleted first.

The script is submitted into the background using:

$ /home/dxtans/trapbg &
[1] 708790
$ 1
2
3

Now from another terminal, send a signal SIGHUP to kill it.

$ ps -ef |grep trapbg
 dxtans 708790 2457860 11:49:39 pts/0 0:00 /bin/bash /home/dxtans/trapbg
$ kill -1 708790

Now back on the terminal where the script was submitted, the following is displayed:

$ /home/dxtans/trapbg &
[1] 708790
$ 1
2
3
Going down on a SIGHUP - signal 1, now exiting..
[1]+ Done   /home/dxtans/trapbg

Listing 5. trapbg

#!/bin/bash
# trapbg
trap 'echo Going down on a SIGHUP - signal 1, now exiting..; exit' SIGHUP
count=0
while :
do
 sleep 10
 count=$(expr $count + 1)
 echo $count
done

The most common tasks when dealing with signals is to clean up temporary files. Typically, these are created with the PID (the script process pid) that are appended to the user created files in /tmp. Assume the temp files are in this form:

hold1.$$
hold2.$$

A common command to remove these files is:

rm /tmp/hold*.$$

The following piece of code traps for SIGNHUP SIGINT SIGQUIT SIGTERM then remove the files:

trap 'rm /tmp/hold*.$$; exit' SIGNHUP SIGINT SIGQUIT SIGTERM

Earlier in this article, I demonstrated that using set -e causes a script to terminate upon an occurrence on a non-zero exit status from a command. Within trap, you have a similar option; it is not really a signal as such but is based on set -e as if it was invoked. It traps a non-zero exit status from a command, using the ERR variable. The ERR goes with the signal list within the trap command. In the following example, a non-existent file is copied, which invokes an error:

#!/bin/bash
# trap1b
trap 'echo I have error in my script..' ERR
cp /home/dxtans/afile /tmp

When executed, the output is:

$ trap1b
cp: /home/dxtans/afile: A file or directory in the path name does not exist.
I have error in my script.

There are two variables that come in handy when dealing with traps to give you more information on the script termination,LINENO and BASH_COMMAND. The BASH_COMAMND is exclusive to bash. These report, or attempt to report, the line number that the script is currently executing, and also the current command that is running. The following example, Listing 6 below, demonstrates this. The script executes a list of echo and sleep commands. When the script is sent either a SIGHUP, SIGINT, SIGQUIT, the script terminates. A message displays containing the line number and command when the trap was caught; the script then exits (from the exit command on the trap command list). Notice that the trap calls the function my_exit to display the information. By parsing the parameters $1 (LINENO) and $2 (BASH_COMMAND), it also logs a message to /var/adm/messages of the event. Other clean up commands would be put in this function, if required.

Listing 6. trap4

#!/bin/bash
# trap4

trap 'my_exit $LINENO $BASH_COMMAND; exit' SIGHUP SIGINT SIGQUIT
my_exit()
{
echo "$(basename $0)  caught error on line : $1 command was: $2"

logger -p notice "script: $(basename $0) was terminated: line: $1, command was $2"
 # cleanp commands here if any
}

echo 1
sleep 1
echo 2
sleep 1
echo 3

Running this script a couple of times, and then interrupting at different intervals, produces the following output.

$ trap4
1
2
^Ctrap4  caught error on line : 15 command was: sleep

$ trap4
1
^Ctrap4  caught error on line : 13 command was: sleep

In /var/adm/messages, we have an entry for the script termination:

Apr  6 12:12:46 rs6000 user:notice dxtans: script: trap4 was terminated: line: 13,
 command was sleep

There are occasions when you will want to ignore certain signals. Perhaps you wish to prevent someone hitting Ctrl-C or Ctrl-\ on the keyboard by mistake when your script is doing some processing on large files, and you wish it to complete, without user interruption. The following segment of code achieves this:

trap '' SIGINT SIGQUIT

You can also ignore certain signals during a portion of your script, then re-instate them later on when you do wish to catch the signals so you can take some form of action. The script contained in Listing 7 below ignores the signals SIGINT and SIGQUIT until after the sleep command has finished. Then when the next sleep command starts, trap takes action if the signals are sent and terminates. As in the previous examples, you can assume the sleep commands represent some form of processing.

Listing 7. trapoff_on

#!/bin/bash
# trapoff_on

trap '' SIGINT SIGQUIT
echo "you cannot terminate using ctrl-c or ctrl-\, "
# heavy pressing go on here, cannot interrupt !
sleep 10

trap 'echo terminated; exit' SIGINT SIGQUIT
# user can now interrupt
echo "ok you can now terminate me using those keystrokes"
sleep 10

Sending a signal to a child

Scripts that contain child processes also need to be addressed. Assuming you wish to terminate any child processes, you need to kill these as well. This is accomplished using the trap command as demonstrated in Listing 8 below. In this example, two sleep commands are used as the child processes. These are put into the background; as each process is run, the PID of the process is placed into the variable: $pid. This variable holds the two PIDS of the child (sleep) processes.

To kill the main script, either a SIGHUP,SIGINT,SIGQUIT or SIGTERM is sent. Upon catching this signal, a kill command is issued to the PID of the child processes contained in the variable $pid. Once completed, the script exits. The wait at the end of the script will wait for the child processes to terminate or complete. Further signal traps may be required that would be contained within the child scripts to do further cleaning up before exit. Clearly, this depends on your type of processing.

The following example kills the children when the parent is sent one of the signals.

Listing 8. trapchild

#!/bin/bash
# trapchild

sleep 120 &

pid="$!"

sleep 120 &
pid="$pid $!"

echo "my process pid is: $$"
echo "my child pid list is: $pid"

trap 'echo I am going down, so killing off my processes..; kill $pid; exit' SIGHUP SIGINT 
 SIGQUIT SIGTERM 

wait

Upon execution of the script, the following displays:

$ /home/dxtans/trap/trapchild
my process pid is: 6553626
my child pid list is: 5767380 6488072

Check from the terminal that the processes are running, along with the child processes (the two sleep commands).

$ ps -ef |grep trapchild
    root 6553626 5439516   0 20:51:32  pts/1  0:00 /bin/bash /home/dxtans/trap/trapchild
$ ps -ef |grep sleep
root 5767380 6553626   0 20:51:32  pts/1  0:00 sleep 120
root 6488072 6553626   0 20:51:32  pts/1  0:00 sleep 120

Let's now send a SIGTERM to the parent process. The script terminates and terminates the child processes.

$ kill -15 6553626

The script then terminates with the following output:

$ /home/dxtans/trap/trapchild
my process pid is: 6553626
my child pid list is: 5767380 6488072
I am going down, so killing off my processes..

Check that nothing is returned after the termination:

# ps -ef |grep sleep

Conclusion

Using traps within your scripts requires a little extra effort. The result can be that when a trappable signal is inbound to your script, you will be in a good position to take action.