In the Unix tradition, the tradeoffs we described above are met
by well-established interface design patterns. Here is a bestiary
of these patterns, with analyses and examples. We'll follow it with
a discussion of how to apply them.
Note that this bestiary does not include GUI design patterns
(though it includes a design pattern that can use a GUI as a
component). There are no design patterns in graphical user interfaces
themselves that are specifically native to Unix. A promising beginning
of a discussion of GUI design patterns in general can be found at
Experiences — A Pattern Language for User Interface
Design [Coram-Lee].
Also note that programs may have modes that fit more than one
interface pattern. A program that has a compiler-like interface,
for example, may behave as a filter when no file arguments are
specified on the command line (many format converters behave like
this).
The interface-design pattern most classically associated with
Unix is the
filter
. A filter program takes data on
standard input, transforms it in some fashion, and sends the result
to standard output. Filters are not interactive; they may query
their startup environment, and are typically controlled by
command-line options, but they do not require feedback or commands
from the user in their input stream.
Two classic examples of filters are
tr(1)
and
grep(1). The
tr(1)
program is a utility that translates data on standard input to
results on standard output using a translation specification given on
the command line. The
grep(1)
program selects lines from standard input according to a match
expression specified on the command line; the resulting selected lines
go to standard output. A third is the
sort(1)
utility, which sorts lines in input according to criteria specified on
the command line and issues the sorted result to standard
output.
Both
grep(1)
and
sort(1)
(but not
tr(1))
can alternatively take data input from a file (or files) named on the
command line, in which case they do not read standard input but act
instead as though that input were the catenation of the named files
read in the order they appear. (In this case it is also expected that
specifying “-” as a filename on the command line will
direct the program explicitly to read from standard input.) The
archetype of such ‘catlike’ filters is
cat(1),
and filters are expected to behave this way unless there are
application-specific reasons to treat files named on the command line
differently.
When designing filters, it is well to bear in mind some
additional rules, partly developed in Chapter1:
-
Remember Postel's Prescription: Be generous in what
you accept, rigorous in what you emit.
That is, try to
accept as loose and sloppy an input format as you can and emit as
well-structured and tight an output format as you can. Doing the
former reduces the odds that the filter will be brittle in the face of
unexpected inputs, and break in someone's hand (or in the middle of
someone's toolchain). Doing the latter increases the odds that your
filter will someday be useful as an input to other programs.
-
When filtering, never throw away information you don't
need to.
This, too, increases the odds that your filter
will someday be useful as an input to other programs. Information you
discard is information that no later stage in a pipeline can
use.
-
When filtering, never add noise.
Avoid
adding nonessential information, and avoid reformatting in ways that
might make the output more difficult for downstream programs to parse.
The most common offenders are cosmetic touches like headers, footers,
blank/ruler lines, summaries and conversions like adding aligned
columns, or writing a factor of "1.5" as "150%". Times and dates are
a particular bother because they're hard for downstream programs to
parse. Any such additions should be optional and controlled by
switches. If your program emits dates, it's good practice to have a
switch that can force them into ISO8601 YYYY-MM-DD and hh:mm:ss
formats — or, better yet, use those by default.
The term “filter” for this pattern is
long-established Unix jargon.
|
“Filter” is indeed long-established. It came into
use on day one of pipes. The term was a natural transferral from
electrical-engineering usage: data flowed from source through filters
to sink. Source or sink could be either process or file. The
collective EE term, “circuit”, was never considered,
since the plumbing metaphor for data flow was already well
established.
|
|
| --
Doug McIlroy
|
|
Some programs have interface design patterns like the filter,
but even simpler (and, importantly, even easier to script). They
are cantrips, sources, and sinks.