Core Utilities
grep
grep me no patterns and I’ll tell you no lines. — fortune cooky
As explained in The Unix Programing Environment by Brian Kernighan and Rob Pike, sed
can do just about everything grep
can.
sed -n '/pattern/p' files
is the same as
grep -h pattern files
Why do we have both sed and grep? After all, grep is just a simple special case of sed. Part of the reason is history — grep came well before sed. But grep survives, indeed thrives, because for the job that they both do, it is significantly easier to use than sed is.
sed
Four types of sed scripts
1. Multiple Edits to the Same File
2. Making Changes Across a Set of Files
3. Extracting Contents of a File
4. Edits To Go
Substitute [address]s/regexp/replacement/[flags]
The ‘s’ command (as in substitute) is probably the most important in ‘sed’ and has a lot of different options. The syntax of the ‘s’ command is ’s/REGEXP/REPLACEMENT/FLAGS'.
The ‘/’ characters may be uniformly replaced by any other single character within any given ‘s’ command.
awk
#!/usr/bin/bash awk '{ print }' "$@"
BEGIN
#!/usr/bin/bash awk 'BEGIN { print "NAME\tRATE\tHOURS"; print "" } { print }' "$@"
END
#!/usr/bin/bash awk ' { OFS="\t" } BEGIN { print "NAME", "RATE", "HOURS"; print "" } { print $1, $2, $3} { rate = rate + $2 } { hours = hours + $3 } END { print "AVERAGE", rate / NR, hours / NR }' "$@"
#!/usr/bin/bash ExampleGroup 'basic awk oneliner' Example 'print emp.data' When call bin/eg1.sh bin/emp.data The output should eq 'Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18' End Example 'print a header using BEGIN' When call bin/eg2.sh bin/emp.data The output should eq 'NAME RATE HOURS Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18' End Example 'print average rates and hours using END' When call bin/eg3.sh bin/emp.data The output should eq 'NAME RATE HOURS Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18 AVERAGE 4.41667 11.6667' End End
Built-in variables
- ARGC
- number of command-line arguments
- ARGV[n]
- array of command-line arguments
- FILENAME
- name of current input file
- FNR
- input record number in current file
- FS
- input field separator (default blank)
- NF
- number of fields in current input record
- NR
- input record number since beginning
- OFMT
- output format for numbers (default "%.6g")
- OFS
- output field separator (default blank)
- ORS
- output record separator (default newline)
- RLENGTH
- length of string matched by regular expression in match
- RS
- input record separator (default newline)
- RSTART
- beginning position of string matched by match
- SUBSEP
- separator for array subscripts of form [i,j...] (default "\034")
Chapter 2 Awk book examples
countries.tsv
USSR 8649 275 Asia Canada 3852 25 North America China 3705 1032 Asia USA 3615 237 North America Brazil 3286 134 South America India 1267 746 Asia Mexico 762 78 North America France 211 55 Europe Japan 144 120 Asia Germany 96 61 Europe England 94 56 Europe
BEGIN and END example
#!/usr/bin/bash awk 'BEGIN { FS = "\t" printf("%10s %6s %5s %s\n\n", "COUNTRY", "AREA", "POP", "CONTINENT") } { printf("%10s %6d %5d %s\n", $1, $2, $3, $4) area = area + $2 pop = pop + $3 } END { printf("\n%10s %6d %5d\n", "TOTAL", area, pop) }' "$@"
#!/usr/bin/bash ExampleGroup 'Chapter 2 Examples BEGIN and END' Example 'print countries with column headers and totals' When call bin/ch2_eg1.sh bin/countries.tsv The output should eq ' COUNTRY AREA POP CONTINENT USSR 8649 275 Asia Canada 3852 25 North America China 3705 1032 Asia USA 3615 237 North America Brazil 3286 134 South America India 1267 746 Asia Mexico 762 78 North America France 211 55 Europe Japan 144 120 Asia Germany 96 61 Europe England 94 56 Europe TOTAL 25681 2819' End End
String-matching Patterns
- /regexpr/
- Matches when the current input line contains a substring matched by regexpr. Shorthand for $0 ~ /regexpr/
- expression ~ /regexpr/
- Matches if the string value of expression contains a substring matched by regexpr.
- expression !~ /regexpr/
- Matches if the string value of expression does not contain a substring matched by regexpr.
#!/usr/bin/bash ExampleGroup '/regexpr/' Example '$4 ~ /Asia/' When call awk '$4 ~ /Asia/' bin/countries.tsv The output should eq 'USSR 8649 275 Asia China 3705 1032 Asia India 1267 746 Asia Japan 144 120 Asia' End End
Range Patterns start, end
#!/usr/bin/bash ExampleGroup 'Range Patterns' Example '/Canada/, /USA/' When call awk '/Canada/, /USA/' bin/countries.tsv The output should eq 'Canada 3852 25 North America China 3705 1032 Asia USA 3615 237 North America' End End
date
The primary reason I’m using bash rather than some programing language is to get easy access to GNU’s powerful date program. It automagically reads most text descriptions of dates and can then translate them to what I want, which is ISO 8601 as specified by schema.org.
Getting dates right has been a constant source of bugs — it took me a while to figure that because my development machine is set to my local time, “Africa/Johannesburg” and my server to “UTC”, joeblog.co.za was showing all events 2 hours early.