AWK programming lesson 2

AWK's main goal in life is to manipulate its input on a line by line basis. An awk program usually goes through some version of
Process a line. Move on.
Process a line. Move on.
Process a line. ...

If what you want to do does not fit into that model, then awk may not be a good fit for what you want to do.

The general syntax used by all awk programing can be described as:

PATTERN {COMMAND(S)}

What this means is,

"For each line of input, go look and see if the PATTERN is present. If it is present, run the stuff between {}"

[If there is no pattern specified, the command gets called for EVERY line]

A specific example:

awk '/#/ {print "Got a comment in the line"}' /etc/hosts
will print out "Got a comment" for every line that contains at least one '#', **anywhere in the line**, in /etc/hosts

The '//' bit in the pattern is one way to specify matching. There are also other wasy to specify if a line matches. For example,

 $1 == "#" {print "got a lone, leading hash"}
will match lines that the first column is a single '#'. The '==' means an EXACT MATCH of the ENTIRE column1.

On the other hand, if you want a partial match of a particular column, use the '~' operator

  $1 ~ /#/ {print "got a hash, SOMEWHERE in column 1"}

NOTE THAT THE FIRST COLUMN CAN BE AFTER WHITESPACE.

Input of "# comment" will get matched
Input of " # comment" will ALSO get matched

If you specifically wanted to match "a line that begins with exactly # and a space" you should use

  /^# /  {do something}

Multiple matching

Awk will process ALL PATTERNS that match the current line. So if the following example is used,

  awk '
     /#/ {print "Got a comment"}
     $1 == "#" {print "got comment in first column"}
     /^# /  {print "Found comment at beginning"}
   ' /etc/hosts

you will get THREE printouts, for a line like
# This is a comment
TWO printouts for
  # This is an indented comment
and only one for
1.2.3.4 hostname # a final comment

Keeping track of context

Not all lines are created equal, even if they look the same. Sometimes you want to do something with a line, based on lines that came before it.

Here is a quick example that prints "ADDR" lines, if you are not in a "secret" section


   awk '

   /secretstart/  	{ secret=1}
   /ADDR/		{ if(secret==0) print $0 } /* $0 is entire line */
   /secretend/		{ secret=0} '

The following will print out stuff that has "ADDR" in it, except if a "secretstart" string has been seen. ORDER MATTERS. For example, if the above was instead written as

   awk '

   /ADDR/		{ if(secret==0) print $0 } /* $0 is entire line */
   /secretstart/  	{ secret=1}

   /secretend/		{ secret=0} '

and given the following input

ADDR a normal addr
secretstart ADDR a secret addr
ADDR another secret addr
a third secret ADDR
secretend
ADDR normal too

it would PRINT OUT the first "secret" addr. Whereas the original would keep both secrets quiet.


Author: phil@bolthole.com
Top of AWK lessons - Next: Chapter 3 Prev: Chapter 1
bolthole main page