smoser's thoughts

A dumping ground for thoughts and writings by smoser.

Today I learned: set -e sucks even more.

Summary: Just don’t use set -e.

I’ve never been a fan “errexit” in shell. You’ve probably seen this as set -e, or set -o errexit or sh -e.

People write lists of shell commands in a file and want the script to exit on the first one that fails rather than barreling on and causing damage. That seems sane.

I’ve always strived to write “shell programs” rather than “shell scripts”. The difference being that the program will clean up after itself and give sane error messages. It won’t just exit when mkdir fails and leave the user to understand some message like:

 mkdir: cannot create directory ‘/tmp/work/bcd’: No such file or directory

I’ve always felt that errexit makes “good error handling” in shell more difficult.

The bash(1) man page has the following text:

If a compound command or shell function executes in a context where -e is being ignored, none of the commands executed within the compound command or function body will be affected by the -e setting, even if -e is set and a command returns a failure status. If a compound command or shell function sets -e while executing in a context where -e is ignored, that setting will not have any effect until the compound command or the command containing the function call completes.

Here’s an example of how painful it can be, and I took way too long today tracking down what was wrong.

  1. A Programmer starts off with a simple script make-lvs and uses set -e.

    #!/bin/bash -ex
    lvm lvcreate --size=1G myvg -n mylv0
    lvm lvcreate --size=1G myvg -n mylv1
    

    This looks fine for a “shell script”. The -x argument even makes shell write to standard error the commands it is running. At this point everyone is happy.

  2. Later the programmer looks at the script and realizes that he/she needs more flags to lvm lvcreate. So now the script looks like:

    #!/bin/bash -ex
    lvm lvcreate --ignoremonitoring --yes --activate=y --setactivationskip=n --size=1G --name=mylv0 mylv0
    lvm lvcreate --ignoremonitoring --yes --activate=y --setactivationskip=n --size=1G --name=mylv1 myvg 
    

    I’m happy that the programmer here used long format flags as they are much more self documenting so readers don’t have to (as quickly) open up the lvm man page. It is easier to make sense of that than it is to read ‘lvm lvcreate --ignoremonitoring -y -ay -ky -L1G -n mylv’.

  3. After doing so, they realize that they can make this look a lot nicer, and reduce the copy/paste code with a simple function wrapper.

    #!/bin/bash -e
    lvcreate() {
        echo "Creating lv $2 on vg $1 of size $3"
        lvm lvcreate "--size=$3" --ignoremonitoring --yes --activate=y \
            --setactivationskip=n --name="$2" "$1"
        echo "Created $1/$2"
    }
    
    lvcreate myvg mylv0 1G
    lvcreate myvg mylv1 1G
    

    The improvements are great. The complexity of the lvm lvcreate is abstracted away nicely. They’ve even dropped the vile set -x in favor of more human friendly messages.

    Output of a failing lvm command looks like this:

    $ make-lvs; echo "exited with $?"
    Creating lv mylv0 on vg myvg of size 1G
    out of space
    exited with 1
    
  4. The next improvement is where sanity goes completely out the window. [I realize you were questioning my sanity long ago due to my pursuit of shell scripting perfection].

    The programmer tries to add reasonable ‘FATAL’ messages that you might find in log messages of other other programming languages.

     #!/bin/bash -e
     lvcreate() {
         echo "Creating lv $2 on vg $1 of size $3"
         lvm lvcreate "--size=$3" --ignoremonitoring --yes --activate=y \
             --setactivationskip=n --name="$2" "$1"
         echo "Created $1/$2"
     }
     fail() { echo "FATAL:" "$@" 1>&2; exit 1; }
        
     lvcreate myvg mylv0 1G || fail "Failed to create mylv0"
        
     if ! lvcreate myvg mylv1 2G; then
         fail "Failed to create lylv1"
     fi
     echo "Success"
    

    Can you guess what is going to happen here?

    If the lvm command fails (perhaps the vg is out of space) then the output of this script will look like:

     $ make-lvs; echo exited with $?
     Creating lv mylv1 on vg myvg of size 1G
     error: out of space
     Created myvg/mylv1
     Creating lv mylv2 on vg myvg of size 1G
     error: out of space
     Created myvg/mylv2
     Success
     exited with 0
    

The attempt to handle the failure of the lvcreate function with || on lines 10 and with if ! on line 12 made it a “compound command”. A compound command disables the error handling and shell exit that would have come from -e when the lvm command on line 4 failed.

Above I’ve demonstrated with bash, but this is actually posix behavior, and you can just as well test the function with sh as well.

Madness.

If you’re interested in further reading, you can see this topic on the BashFAQ. I agree with GreyCat and geirha: “don’t use set -e. Add your own error checking instead.”

If you’re still here, the following is the version of the script that I’d like to see. Of course there are other improvements that can be made, but I’m happy with it.

   #!/bin/bash
   info() { echo "$@" 1>&2; }
   stderr() { echo "$@" 1>&2; }
   fail() { stderr "FATAL:" "$@"; exit 1; }
   lvcreate() {
       local vg="$1" lv="$2" size="$3"
       info "Creating $vg/$lv size $size"
       lvm lvcreate \
           --ignoremonitoring --yes --activate=y --setactivationskip=n \
           --size="$size" --name="$lv" "$vg" || {
               stderr "failed ($?) to create $vg/$lv size $size"
               return 1
           }
       info "Created $vg/$lv"
   }
   
   # demonstrate both 'command ||' and 'if ! command; then' styles.
   lvcreate myvg mylv0 1G || fail "Failed to create mylv0"
   
   if ! lvcreate myvg mylv1 1G; then
       fail "Failed to create mylv1"
   fi
   
   info "Success"