AWK programming

Awk is essentially a stream editor, like sed. You can pipe text to it, and it can manipulate it on a line-by-line basis. [or it can read from a file]. It is also a programming language. That basically means it can do anything sed can do, and a lot more. (But you might have to type more :-)

Unlike sed, it has the ability to remember context, do comparisons, and most things another full programming language can do. For example, it isn't just limited to single lines. It can JOIN multiple lines, if you do things right.

The simplest form of awk is a one-liner:


  awk '{ do-something-here }'

The "do-something-here" can be a single print statement, or something much more complicated. It gets somewhat 'C'-like. Simple example:


  awk '{print $1,$3}'

will print out the first and third columns, where columns are defined as "Things between whitespace". (whitespace==tab or space)

A more complicated example:


awk '{ 	if ( $1 == "start") {
		start=1;
		print "started";
		if ( $2 != "" )	{
			print "Additional args:",$2,$3,$4,$5
		}
		continue;
	}
	if ( $1 == "end") {
		print "End of section";
		printf ("Summary: %d,%d,%d (first, second, equal)\n",
			firstcol, secondcol, tied);
		firstcol=0;
		secondcol=0;
		tied=0;
		start=0;
	}
	if ( start >0) {
		if ( $1 > $2 ) {
			firstcol= firstcol+1
		}else 
		if ( $2 > $1 ) {
			secondcol= secondcol+1
		}else
			tied=$tied+1
	}
}'

[This is just a quick showcase. How this all works, will be covered in more detail in the next lesson]

Key points to remember about variables, compared to other languages:

Variables that represent input field positions, aka columns in your input, are referenced with '$'.

You can actually alter a field value. But it doesn't matter if you are using them, or setting them, you still keep the '$'
Other variables do NOT use '$'.

Like shellscripting, variables are automatically initialized to 0

What does that long example do? It looks at input line by line, and wait for segments between

start

end

blocks.

In-between those markers, it expects two columns of numbers. It keeps track of how many lines have the first number greater, or the second, or they are both tied. Once it hits an 'end' marker, it prints out the tally, and zeros the counters.

So, an input file of

start
1 3
1 5
2 1
end

will generate

started
End of section
Summary: 1,2,0 (first, second, equal)

I think that's enough for folks to digest. End of awk lesson 1.

Author: phil@bolthole.com
Top of AWK lessons - Next: Chapter 2 Prev: Contents
bolthole main page