Sunday, February 23, 2014

Using "lockfile" to Prevent Multiple Instances of a Script from Running

This post describes how you can ensure that only one instance of a script is running at a time, which is useful if your script:

  • uses significant CPU or IO and running multiple instances at the same time would risk overloading the system, or
  • writes to a file or other shared resource and running multiple instances at the same time would risk corrupting the resource

In order to prevent multiple instances of a script from running, your script must first acquire a "lock" and hold on to that lock until the script completes. If the script cannot acquire the lock, it must wait until the lock becomes available. So, how do you acquire a lock? There are different ways, but the simplest is to use the lockfile command to create a "semaphore file". This is shown in the snippet below:

set -e

# waits until a lock is acquired and
# deletes the lock on exit.
# prevents multiple instances of the script from running
acquire_lock() {
    echo "Acquiring lock ${lock_file}..."
    lockfile "${lock_file}"
    trap "rm -f ${lock_file} && echo Released lock ${lock_file}" INT TERM EXIT
    echo "Acquired lock"

# do stuff

The acquire_lock function first invokes the lockfile command in order to create a file. If lockfile cannot create the file, it will keep trying forever until it does. You can use the -r option if you only want to retry a certain number of times. Once the file has been created, we need to ensure that it is deleted once the script completes or is terminated. This is done using the trap command, which deletes the file when the script completes or when the shell receives an interrupt or terminate signal. I also like to use set -e in all my scripts, which makes the script exit if any command fails. In this case, if lockfile fails, the script will exit and the trap will not be set.

lockfile can be used in other ways as well. For example, instead of preventing multiple instances of the entire script from running, you may want to use a more granular approach and use locks only around those parts of your script which are not safe to run concurrently.

Note, that if you cannot use lockfile, there are other alternatives such as using mkdir or flock as described in BashFAQ/045.

Other posts you might like:
Shell Scripting - Best Practices
Retrying Commands in Shell Scripts
Executing a Shell Command with a Timeout

Saturday, February 08, 2014

Retrying Commands in Shell Scripts

There are many cases in which you may wish to retry a failed command a certain number of times. Examples are database failures, network communication failures or file IO problems.

The snippet below shows a simple method of retrying commands in bash:


until command || (( attempt_num == MAX_ATTEMPTS ))
    echo "Attempt $attempt_num failed! Trying again in $attempt_num seconds..."
    sleep $(( attempt_num++ ))

In this example, the command is attempted a maximum of five times and the interval between attempts is increased incrementally whenever the command fails. The time between the first and second attempt is 1 second, that between the second and third is 2 seconds and so on. If you want, you can change this to a constant interval or random exponential backoff instead.

I have created a useful retry function (shown below) which allows me to retry commands from different places in my script without duplicating the retry logic. This function returns a non-zero exit code when all attempts have been exhausted.


# Retries a command on failure.
# $1 - the max number of attempts
# $2... - the command to run
retry() {
    local -r -i max_attempts="$1"; shift
    local -r cmd="$@"
    local -i attempt_num=1

    until $cmd
        if (( attempt_num == max_attempts ))
            echo "Attempt $attempt_num failed and there are no more attempts left!"
            return 1
            echo "Attempt $attempt_num failed! Trying again in $attempt_num seconds..."
            sleep $(( attempt_num++ ))

# example usage:
retry 5 ls -ltr foo

Related Posts:
Executing a Shell Command with a Timeout
Retrying Operations in Java