When doing data-driven programming, one
clearly distinguishes code from the data structures on which it acts,
and designs both so that one can make changes to the logic of the program
by editing not the code but the data structure.
Data-driven programming is sometimes confused with object
orientation, another style in which data
organization is supposed to be central. There are at least two
differences. One is that in data-driven programming, the data is not
merely the state of some object, but actually defines the control flow
of the program. Where the primary concern in
OO is
encapsulation, the primary concern in data-driven programming is
writing as little fixed code as possible. Unix has a stronger
tradition of data-driven programming than of OO.
Programming data-driven style is also sometimes confused with
writing state machines. It is in fact possible to express the logic
of a state machine as a table or data structure, but hand-coded state
machines are usually rigid blocks of code that are far harder to
modify than a table.
An important rule when doing any kind of code generation or
data-driven programming is this: always push problems upstream. Don't
hack the generated code or any intermediate representations by hand
— instead, think of a way to improve or replace your translation
tool. Otherwise you're likely to find that hand-patching bits which
should have been generated correctly by machine will have turned into
an infinite time sink.
At the upper end of its complexity scale, data-driven
programming merges into writing interpreters for p-code or simple
minilanguages of the kind we surveyed in Chapter8. At other edges, it merges into code
generation and state-machine programming. The distinctions are not
actually that important; the important part is moving program logic
away from hardwired control structures and into data.
I maintain a program called ascii, a
very simple little utility that tries to interpret its command-line
arguments as names of ASCII (American Standard Code for Information
Interchange) characters and report all the equivalent names. Code and
documentation for the tool are available from the project page. Here is an illustrative
screenshot:
esr@snark:~/WWW/writings/taoup$ ascii 10
ASCII 1/0 is decimal 016, hex 10, octal 020, bits 00010000: called ^P, DLE
Official name: Data Link Escape
ASCII 0/10 is decimal 010, hex 0a, octal 012, bits 00001010: called ^J, LF, NL
Official name: Line Feed
C escape: '\n'
Other names: Newline
ASCII 0/8 is decimal 008, hex 08, octal 010, bits 00001000: called ^H, BS
Official name: Backspace
C escape: '\b'
Other names:
ASCII 0/2 is decimal 002, hex 02, octal 002, bits 00000010: called ^B, STX
Official name: Start of Text
One indication that this program was a good idea is the fact
that it has an unexpected use — as a quick CLI aid to converting
between decimal, hex, octal, and binary representations of bytes.
The main logic of this program could have been coded as a
128-branch case statement. This would, however, have made the code
bulky and difficult to maintain. It would also have tangled parts
that change relatively rapidly (like the list of slang names for
characters) with parts that change slowly or not at all (like
the official names), putting them both in the same legend string and
making errors during editing much more likely to touch data that
ought to be stable.
Instead, we apply data-driven programming. All the character
name strings live in a table structure that is quite a bit larger than
any of the functions in the code (indeed, counted in lines it is
larger than any
three
of the functions in the
program). The code merely navigates the table and does low-level
tasks like radix conversions. The initializer actually lives in the
file nametable.h, which is generated in a way
we'll describe later in this chapter.
This organization makes it easy to add new character names,
change existing ones, or delete old names by simply editing the table,
without disturbing the code.
(The way the program is built is good Unix style, but the
output format is questionable. It's hard to see how the output could
usefully become the input of any other program, so it does not play
well with others.)
[an error occurred while processing this directive]