Ksh and POSIX utilities

POSIX.1 (or is it POSIX.2?) compliant systems (eg: most current versions of UNIX) come with certain incredibly useful utilities. The short list is:


cut, join, comm, fmt, grep, egrep, sed, awk

Any of these commands (and many others) can be used within your shellscripts to manupulate data.

Some of these are programming languages themselves. Sed is fairly complex, and AWK is actually its own mini-programming language. So I'll just skim over some basic hints and tricks.

cut

"cut" is a small, lean version of what most people use awk for. It will "cut" a file up into columns, and particularly, only the columns you specify. Its drawbacks are:

It is picky about argument order. You MUST use the -d argument before the -f argument
It defaults to a tab, SPECIFICALLY, as its delimiter of columns.

The first one is just irritating. The second one is a major drawback, if you want to be flexible about files. This is the reason why AWK is used more, even for this trivial type of operator: Awk defaults to letting ANY whitespace define columns.

join

join is similar to a "database join" command, except it works with files. If you have two files, both with information sorted by username, you can "join" them in one file, IF and ONLY IF they are also sorted by that same field. For example

john_s John Smith

in one file, and

john_s 1234 marlebone rd

will be joined to make a single line,

john_s John Smith 1234 marlebone rd

If the files do not already have a common field, you could either use the paste utility to join the two files, or give each file line numbers before joining them, with

cat -n file1 >file1.numbered

comm

I think of "comm" as being short for "compare", in a way. But technically, it stands for "common lines". First, run any two files through "sort". Then you can run 'comm file1 file2' to tell you which lines are ONLY in file1, or ONLY in file2, or both. Or any combination.

For example

comm -1 file1 file2

means "Do not show me lines ONLY in file1." Which is the same thing as saying "Show me lines that are ONLY in file2", and also "Show me lines that are in BOTH file1 and file2".

fmt

fmt is a simple command that takes some informational text file, and word-wraps it nicely to fit within the confines of a fixed-width terminal. Okay, it isn't so useful in shellscripts, but its cool enough I just wanted to mention it :-)

pr is similarly useful. But where fmt was more oriented towards paragaphs, pr is more specifically toward page-by-page formatting.

grep and egrep

These are two commands that have more depth to them than is presented here. But generally speaking, you can use them to find specific lines in a file (or files) that have information you care about.
One of the more obvious uses of them is to find out user information. For example,

grep joeuser /etc/passwd

will give you the line in the passwd file that is about account 'joeuser'. If you are suitable paranoid, you would actually use

grep '^joeuser:' /etc/passwd

to make sure it did not accidentally pick up information about 'joeuser2' as well.

(Note: this is just an example: often, awk is more suitable than grep, for /etc/passwd fiddling)

sed

Sed actually has multiple uses, but its simplest use is "substitute this string, where you see that string". The syntax for this is


sed 's/oldstring/newstring/'

This will look at every line of input, and change the FIRST instance of "oldstring" to "newstring".

If you want it to change EVERY instance on a line, you must use the 'global' modifier at the end:


sed 's/oldstring/newstring/g'

If you want to substitute either an oldstring or a newstring that has slashes in it, you can use a different separator character:


sed 's:/old/path:/new/path:'

awk

Awk really deserves its own tutorial, since it is its own mini-language. And, it has one!

But if you dont have time to look through it, the most common use for AWK is to print out specific columns of a file. You can specify what character separates columns. The default is 'whitespace' (space, or TAB). But the cannonical example, is "How do I print out the first and fifth columns/fields of the password file?"


awk -F: '{print $1,$5}' /etc/passwd

"-F:" defines the "field separator" to be ':'

The bit between single-quotes is a mini-program that awk interprets. You can tell awk filename(s), after you tell it what program to run. OR you can use it in a pipe.

You must use single-quotes for the mini-program, to avoid $1 being expanded by the shell itself. In this case, you want awk to literally see '$1'

"$x" means the 'x'th column
The comma is a quick way to say "put a space here".
If you instead did


awk -F: '{print $1 $5}' /etc/passwd

awk would not put any space between the columns!

If you are interested in learning more about AWK, read my AWK tutorial

TOP of tutorial
Next: Functions Prev: Advanced variable usage
This material is copyrighted by Philip Brown