| |
Unix Programming - Applying Minilanguages - Case Study: awk
Case Study: awk
The awk minilanguage is an old-school
Unix tool, formerly much used in shellscripts. Like
m4, it's intended for writing small but
expressive programs to transform textual input into textual output.
Versions ship with all Unixes, several in open source; the command
info gawk at your Unix shell prompt is quite likely
to take you to on-line documentation.
Programs in awk consist of
pattern/action pairs. Each pattern is a regular
expression, a concept we'll describe in detail in Chapter9. When an awk
program is run, it steps through each line of the input file. Each
line is checked against every pattern/action pair in order. If the
pattern matches the line, the associated action is performed.
Each action is coded in a language resembling a subset of C, with
variables and conditionals and loops and an ontology of types
including integers, strings, and (unlike C)
dictionaries.[89]
The awk action language is
Turing-complete, and can read and write files. In some versions it can
open and use network sockets. But awk has
primarily seen use as a report generator, especially for interpreting
and reducing tabular data. It is seldom used standalone, but rather
embedded in scripts. There is an example
awk program in the case study on HTML generation included in
Chapter9.
A case study of awk is included to
point out that it is
not
a model for emulation;
in fact, since 1990 it has largely fallen out of use. It has been
superseded by new-school scripting
languages—notably
Perl, which was explicitly designed to be an
awk killer. The reasons are worthy of
examination, because they constitute a bit of a cautionary tale for
minilanguage designers.
The awk language was originally
designed to be a small, expressive special-purpose language for report
generation. Unfortunately, it turns out to have been designed at a
bad spot on the complexity-vs.-power curve. The action language is
noncompact, but
the pattern-driven framework it sits inside keeps it from being
generally applicable — that's the worst of both worlds. And the
new-school scripting languages can do anything
awk can; their equivalent programs are
usually just as readable, if not more so.
|
Awk has also fallen out of use because more modern shells have
floating point arithmetic, associative arrays, RE pattern matching,
and substring capabilities, so that equivalents of small awk scripts
can be done without the overhead of process creation.
|
|
| --
David Korn
|
|
For a few years after the release of Perl in 1987,
awk remained competitive simply because it
had a smaller, faster implementation. But as the cost of compute
cycles and memory dropped, the economic reasons for favoring a
special-purpose language that was relatively thrifty with both lost
their force. Programmers increasingly chose to do awklike things with
Perl or (later) Python, rather than keep two different scripting
languages
in their heads.[90] By the year 2000
awk had become little more than a memory
for most old-school Unix
hackers, and not a
particularly nostalgic one.
Falling costs have changed the tradeoffs in minilanguage design.
Restricting your design's capabilities to buy
compactness may
still be a good idea, but doing so to economize on machine resources
is a bad one. Machine resources get cheaper over time, but space in
programmers' heads only gets more expensive. Modern minilanguages can
either be general but noncompact, or specialized but very
compact; specialized
but noncompact simply won't compete.
[an error occurred while processing this directive]
|
|