Frontier Software

Bash

You put your left text in, you take your right text out
You put your left text in and you bash it all about
And then you do the hokey pokey
And you turn yourself around
And that what it’s all about

Bash Manual

I’ve recently dived into Linux’s bash scripting after getting disenchanted with Python which broke my previous favourite content management system MoinMoin, which is still used by BashGuide.

Learning bash is frustrating because wiring separate programs together is fiddly, and you can easily be forced to reinstall your entire Linux system and start from scratch if you make a typo as root.

Learn by testing

I’ve tried learning Bash from the classics — The Unix Programming Environment by Brian Kernighan & Rob Pike, O’Reilly’s Learning the Bash Shell by Cameron Newbam & Bill Rosenblatt… — but I only really started to finally reach enlightenment by embarking on a fairly substantial project aided by these two great tools:

Since whitespace conventions are much in vogue, a shoutout to shfmt but it messes up my ShellSpec scripts, so I don’t use it much.

I’m storing my shellspec scripts here as I create them as notes for myself and anyone interested as I learn.

An “AI website” I’ve found very handy is phind.com. But unfortunately it tends to get its info from stackoverflow and related forums, so just use it as a starting point when you’re stuck.

Bash has evolved far from its origins — critics include its “father” Stephen R. Bourne who in this youtube video called it Byzantine as it grows in complexity to compete against Python, Perl,… — but its basic idea of building programs out of other programs (though using other programs less since Bash can often do the work “inhouse” if you know how) is something I’ve become a growing fan of.

Useless echo?

Like most newbies when I started learning Bash, ShellCheck would constantly give its SC2116 warning, “Useless echo? Instead of cmd $(echo foo), just use cmd foo.”

The reason is nearly all Bash scripts and functions echo something, which is how they return values.

Coming from JavaScript and wanting to “mutate” a global associative array read in from JSON to ultimately output again as JSON, I made the mistake of writing functions which didn’t echo anything, but instead created or updated the values of the (supposedly) global array.

Here I learnt the hard way what shellcheck’s SC2031 warning means. My top most function didn’t see the changes made due to a Bash rule: “Variables set in subshells are not available outside the subshell.”

There are many ways of accidentally creating subshells. — var was modified in a subshell. That change might be lost.

Long story short, writing Bash scripts and functions that read and write text rather than trying to modify global variables prevents a lot of weird bugs down the line.

So what does return do?

Something that confused me and I’d guess most people coming to Bash from other programming languages is while it has a return statement, this is used to provide its exit status, not the function’s output. Bash’s return makes it a logic programming language, so knowing some Prolog helped to compose statements with || and && which avoids descending into an if pyramid of doom which my early code often did.

To make things even more confusing, whereas 0 means false in classical logic, in Unix it means true. The exit status of a script or function can be read by the next statement from the variable $?.

Once I got the hang of this, it helped compose programs where you want to know what the error is, and dispatch it to the relevant handler.

Text-oriented programing

The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. — Perlisism #34

I googled the phrase text-oriented programing, and sure enough Wikipedia had an entry, unfortunately just a stub, linking to text-oriented operations which has yet to be created. Wikipedia’s page at time of writing consists of links to programing languages coded via text editors, ie just about everything including Bash.

A taxonomy I’m making up as I go along is Bash isn’t just a form of text-oriented programing (TOP to its friends?), but more specifically line-oriented programing for which Wikipedia redirects to the statements section of its Comparison of programming languages (syntax) page.

By line-oriented programing I mean the default behaviour of Bash (along with anything Unix related) is each line of something textual is a record, which in turn can be split into fields by spaces or tabs to use database parlance.

Something bound to frustrate every Bash novice is by default these records are delimited by a literal newline, which needs to be converted to \n if a blob of text with paragraphs should be treated as one record (as required by JSON text values for instance).

To make things even more frustrating, if you use Microsoft, Bash scripts you write in Microsoft text-editors will cause Bash to barf because Microsoft NEWLINEs is \r\n unless their defaults are overwritten. Bash’s pattern matching offers [:cntrl:] which matches the various NEWLINEs out there.

This default behaviour is courtesy of an environment variable IFS (internal field separator) which comes set to space, TAB, NEWLINE. If you wanted to split a row (aka line of text) from a CSV file, you could use this combination of IFS and read.

IFS=',' read -ra col <<< "item1,item2,item3,item4"

That only temporarily changes the value of IFS for that command, and then echo "${col[0]}" returns item1, echo “${col[1]}” item2 etc.

A snag I hit with the above pattern is ShellCheck complained if I left out the -r flag (“do not allow backslashes to escape any characters”), giving warning SC2162 “read without -r will mangle backslashes.”

But including it in one of my attempts to translate JSON paths to bash array broke the code. I eventually turned to readarray, which is identical to mapfile which has a more detailed man page.

readarray -d ',' -t col <<< "item1,item2,item3,item4"

I found the above easier to understand, kept shellcheck happy while not giving weird results sometimes. That readarray and mapfile are identical, I think readarray a better mneumonic of what the command does.

But JSON and HTML aren’t line oriented

For JSON, there’s jq, and for HTML, there’s hq. These are written as traditional Unix filters, with q in jq and hq hinting these are a specific type of filter, a query filter.

Unix filters follow the basic convention that the final argument is either a file containing text in an expected format, or it reads from /dev/stdin what a previous filter in a pipeline passes to it. The classic filters are grep, sed and awk.

What makes the learning curve for Unix filters fairly steep is that each has a DSL which tend to be extremely telegraphic, making them incomprehensible to novices. I’ve coppied this example from Brian Kernighan & Rob Pike’s The Unix Programming Environment.

grep '^[^:]*::' /etc/passwd

When I enter that on my ‘puter, I get nothing because all my entries with double colons :: have preceding single colon : in the line.

Worse yet, even the same filters tend to have several dialects of DSL. For instance, grep was historically a family of filters, and the modern GNU version has softlinks to the historical names egrep and fgrep though using grep -E for “extended regexp” or grep -F for “fixed strings” is considered better.

Though a fan of PostgreSQL, specifically server side scripting with PL/pgSQL, I’ve found storing big blobs of JSON and HTML in a database a real pain. Keeping text in files makes them a lot easier to modify, plus you get syntax highlighting and all the other cool things your favourite text editor provides but psql doesn’t. Another thing that’s made me a bit disenchanted with Postgres is automatic upgrades using Arch’s pacman in my case kept breaking everything.

Prefering to keep my data in text files (stored as *.json, *.html etc, not embedded in some other language) in turn lead to Hugo

Something that tripped me up writing a function to make a small modification to a string — a JSON path string which is comma separated so I decided to split it into and array and use Bash’s ${parameter:offset} syntax without knowing it sneaks a NEWLINE onto the end of the sting.

# object2array myarr '"offers"'
function object2array {
  declare -n arr
  arr=$1
  local rows
  local cols
  local key
  local newkey
  readarray -t rows <<< "$(getkeys "$1" "$2")"
  for key in "${rows[@]}"; do
    readarray -d ',' -t cols <<< "$key"
    if [[ ${cols[1]} =~ ^[0-9]+$ ]]; then
      echo "Error: $2 is already an array" >&2
      return 1
    fi
    newkey="${cols[0]},0,${cols[@]:1}"
    newkey="${newkey%[[:cntrl:]]}" # Took me hours to find this bug
    arr["${newkey}"]=${arr["$key"]}
    unset arr["$key"]
  done
}

The newkey="${newkey%[[:cntrl:]]}" fixed the problem. Complex as that seems, I couldn’t a simple substitution to work because Bash kept removing required quotes, so I just went with the above.

Functions

My function above is a bit unusual because instead of responding textually by echoing a string, it mutates a global object which is how I’ve gravitated to programing with JSON.

Something I found weird is the way Bash functions return answers. They typically use echo, the equivalent of print in other programing languages, and Bash offers printf as a more complex alternative.

https://www.shellcheck.net/wiki/SC1091

Functions can have a return statement. But these are used to pass the exit status ($? to its friends), which defaults to 0 for success, and per-function defined higher numbers to finetune why the function may have failed.

Bash functions are somewhat confusing since they don’t conform to the normal, say JavaScript, style:

function myfunc(arg1, arg2, arg3) {
  return arg1+arg2+arg3
}

In bash, the arguments are passed as space delimited words and refered to by their positions, and output is returned using echo.

Bash functions return an integer, 0 for sucess and 1, 2… for errors. In shell scripting, output is text printed to /dev/stdout.

function myfunc {
  echo "$1+$2+$3"
}

My style is to use Bash functions to keep things DRY by passing variable names which the function alters to trim whitespace, rewrite date formats or whatever. Note ${!1} is used to extract the value from the function name passed as a parameter.

#!/bin/bash

ExampleGroup 'basic function examples'

  Example 'Note $0 isn`t the name of the function, but of the shell running it'
    function myfunc1 {
      echo "$0"
      echo "$1"
      echo "$2"
    }
    When call myfunc1 "arg1" "arg2"
    The output should equal "/bin/sh 
arg1
arg2"
    The status should be success
    The error should be blank
  End

  Example '$@ sees all args from $1 as a space separated string'
    function myfunc2 {
      echo "$@"
    }
    When call myfunc2 "arg1" "arg2"
    The output should equal "arg1 arg2"
    The status should be success
    The error should be blank
  End

  Example 'send a function a variable name to be changed'
    function myfunc3 {
      eval "$1=\"Hello, ${!1}!\""
    }
    var="World"
    myfunc3 var
    When call echo "$var"
    The output should equal "Hello, World!"
    The status should be success
    The error should be blank
  End

End

Logic programing

As visitors to this website may have guessed from all the Prolog content, I’m a fan of logic programing, and Bash actually has very good logic programming support, assuming you get over its 0 is true while everything else is false convention. This spits on all traditional logic, where there is only one false, often defined as 0, while the number of trues depends on the data.

Like most novices, I overused if [[....]]; then ...; else ...; fi before discovering bash’s

try_this || do_this_instead

try_this could be a statement block ( cmd1 && cmd2 && ... )

break vs continue

I initially thought break would cause the current loop item to get skipped, but it actually halts the entire loop so the rest of whatever is geting iterated gets ignored.. What I wanted was continue.

How do I check which shell I’m using?

echo $0

How do I find avalailable shells on my system?

cat /etc/shells

sor

chsh -l

$var if you do want word splitting or filename generation, leave them unquoted but set $IFS accordingly and/or enable or disable filename generation if needed (set -f, set +f).

Shell Expansions

  1. brace expansion No leading $
  2. tilde expansion
  3. parameter and variable expansion
  4. [process substitution]
  5. [command substitution]
  6. [arithmetic expansion]
  7. [word splitting]
  8. [filename expansion]
  9. quote removal

Common options

Redirections

numfmt

Brian Kernighan’s home page

awk

gawk

Builtin variables

$RANDOM returns a random number between 0 and 32767 (signed 16-bit integer)

How do I change my default shell?

phind.com

When is double-quoting necessary?

double quotes are necessary wherever a list of words or a pattern is expected. They are optional in contexts where a single raw string is expected by the parser.

“$var” if you don’t want word splitting or filename generation, always quote variable expansions and command substitutions

indexed-arrays

Something I’ve found a lot harder than it should be is converting JSON arrays to bash “indexed” (as in not associative) arrays. JSON arrays are comma delimited, similar to csv but with surrounding square brackets. There are many ways to do this. The old way involves temporarily resetting the internal field separator environment variable IFS from its default space, TAB, NEWLINE to a comma, which shellcheck and others frown upon.

associative-arrays

By Robert Laing Bash does not support nested arrays, but that’s not a problem for importing and exporting JSON since jq’s path syntax helps make arbitrarily nested JSON objects map to individual strings to be used as keys, remembering to escape the double quotes inside the string. Converting from JSON to bash and back again jq has getpath(PATHS) and setpath(PATHS; VALUE) which I’ve used in my json2array and array2json functions to make this fairly easy.

coreutils

Manual List of Commands

extra

These are utilities that often don’t come included in various Linux distributions. Arch Linux tends to put them in the extra folder

parameter-expansion

String Operators String interpolation, ie inserting text stored in variables into a template which could be JSON or HTML, is something I do a lot and only recently discovered Bash has an embedded domain specific language, which the manual calls parameter expansion and my 1992 O’Reilly book calls “string operators”. I stumbled on these courtesy shellcheck when I was making the common newby mistake of using sed to make changes to strings stored in bash variables, sending them as here strings for tasks like, say trimming whitespace.

util-linux

List of Commands