AWK programming lesson 5

Array concepts

I previously didn't have a reason to use arrays, but since someone recently emailed me an example of when you could use arrays , I feel inclined to share it with folks.

We have previously covered variables, as a name that holds a value for you. Arrays are an extension of variables. Arrays are variables that hold more than one value. How can it hold more than one value? Because it says "take a number".

If you want to hold three numbers, you could say


value1="one"; value2="two"; value3="three";

OR, you could use


values[1]="one"; values[2]="two"; values[3]="three";

You must always have a value in the brackets[] when using a variable as an array type. You can pick any name for an array variable name, but from then on, that name can ONLY be used as an array. You CANNOT meaningfully do


values[1]="one";
values="newvalue";

You CAN reassign values, just like normal variables, however. So the following IS valid:


values[1]="1";
print values[1];
values[1]="one";
print values[1];

The really interesting this is that unlike some other languages, you dont have to just use numbers. The [1],[2],[3] above are actually treated as ["1"], ["2"], ["3"]. Which means you can also use other strings as identifiers, and treat the array almost as a single column database. This formal name for this is an "associative array".

numbers["one"]=1;
numbers["two"]=2;
print numbers["one"];
value="two";
print numbers[value];
value=$1;
if(numbers[value] = ""){ print "no such number"; }

When and how to use arrays

There are different times you might choose to use arrays. As I mentioned, I personally have never needed them :-) But here are some instances that might be relevant to you.

Storing info for later

When using awk as part of a larger shellscript, I tend to just print out information to a temporary file. But if you have a reason to do so, you could save particular words in memory, and print them out all at the end, which would be faster than using a temporary file.


/special/{ savedwords[lnum]=$2; lnum+=1; }
END	{
		count=0;
		while(savedwords[count] != "")
		{
			print count,savedwords[count];
			count+=1;
		}
	}

Instead of just printing the words out, you could use the END section to do some additional processing that you might need, before displaying them.

If you wanted to index the values uniquely(avoiding duplicates), you can actually reference them by their own string. Or, for example, save an array of column 3, indexed by the matched values of column 2

{ threecol[$2]=$3 }
END	{
		for (v in threecol)
		{
			print v, threecol[v]
		}
	}

Arrays, and the split() function

The other primary reason to use arrays, is if you want to do subfields. Lets say you have a line, that has some coarse divisions, and some fine divisions. In other words, top level fields are separated by spaces, but then you get smaller words separated by colons.


  This is a variable:field:type line
  There can be multiple:type:values here

In the above, the fourth space-separated field, has subfields separated by colons.
Now let's say you wanted to know the value of the second subfield, in the fourth major field. One way to deal with this, would be to call two awks, piped together:


awk '{print $4}' | awk -F: '{print $2}'

Yet another way would be to change the field separator variable 'FS', on the fly (this does not work with all awk implementations, apparently):


awk '{ newline=$4; fs=FS; FS=":";  $0=newline; print $2 ; FS=fs; }'

But you could also do it with arrays, using the split() function, as follows:


awk '{ newline=$4; split(newline,subfields,":"); print subfields[2]} '

Using arrays in this case, is the usual and probably nicest way to do it.

Author: phil@bolthole.com
Top of AWK lessons - Next: Chapter 6 Prev: Chapter 4
bolthole top