Unix comes equipped with some powerful special-purpose code
generators for purposes like building lexical analyzers (tokenizers)
and parsers; we'll survey these in Chapter15. But there are much simpler, lighter-weight
sorts of code generation we can use to make life easier without having
to know any compiler theory or write (error-prone) procedural
logic.
Case Study: Generating Code for the ascii Displays
Called without arguments, ascii
generates a usage screen that looks like Example9.5.
Example9.5.ascii usage screen.
Usage: ascii [-dxohv] [-t] [char-alias...]
-t = one-line output -d = Decimal table -o = octal table -x = hex table
-h = This help screen -v = version information
Prints all aliases of an ASCII character. Args may be chars, C \-escapes,
English names, ^-escapes, ASCII mnemonics, or numerics in decimal/octal/hex.
Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex
0 00 NUL 16 10 DLE 32 20 48 30 0 64 40 @ 80 50 P 96 60 ` 112 70 p
1 01 SOH 17 11 DC1 33 21 ! 49 31 1 65 41 A 81 51 Q 97 61 a 113 71 q
2 02 STX 18 12 DC2 34 22 " 50 32 2 66 42 B 82 52 R 98 62 b 114 72 r
3 03 ETX 19 13 DC3 35 23 # 51 33 3 67 43 C 83 53 S 99 63 c 115 73 s
4 04 EOT 20 14 DC4 36 24 $ 52 34 4 68 44 D 84 54 T 100 64 d 116 74 t
5 05 ENQ 21 15 NAK 37 25 % 53 35 5 69 45 E 85 55 U 101 65 e 117 75 u
6 06 ACK 22 16 SYN 38 26 & 54 36 6 70 46 F 86 56 V 102 66 f 118 76 v
7 07 BEL 23 17 ETB 39 27 ' 55 37 7 71 47 G 87 57 W 103 67 g 119 77 w
8 08 BS 24 18 CAN 40 28 ( 56 38 8 72 48 H 88 58 X 104 68 h 120 78 x
9 09 HT 25 19 EM 41 29 ) 57 39 9 73 49 I 89 59 Y 105 69 i 121 79 y
10 0A LF 26 1A SUB 42 2A * 58 3A : 74 4A J 90 5A Z 106 6A j 122 7A z
11 0B VT 27 1B ESC 43 2B + 59 3B ; 75 4B K 91 5B [ 107 6B k 123 7B {
12 0C FF 28 1C FS 44 2C , 60 3C < 76 4C L 92 5C \ 108 6C l 124 7C |
13 0D CR 29 1D GS 45 2D - 61 3D = 77 4D M 93 5D ] 109 6D m 125 7D }
14 0E SO 30 1E RS 46 2E . 62 3E > 78 4E N 94 5E ^ 110 6E n 126 7E ~
15 0F SI 31 1F US 47 2F / 63 3F ? 79 4F O 95 5F _ 111 6F o 127 7F DEL
This screen is carefully designed to fit in 23 rows and 79
columns, so that it will fit in a 2480 terminal window.
This table could be generated at runtime, on the fly. Grinding
out the decimal and hex columns would be easy enough. But between
wrapping the table at the right places and knowing when to print
mnemonics like NUL rather than characters, there would have been enough
odd corner cases to make the code distinctly unpleasant. Furthermore,
the columns had to be unevenly spaced to make the table fit in 79
columns. But any Unix programmer would reflexively express it as a block
of data before finding out these things.
The most nave way to generate the usage screen would have been
to put each line into a C
initializer in the ascii.c source code, and then
have all lines be written out by code that steps through the
initializer. The problem with this method is that the extra data in
the C initializer format (trailing newline, string quotes, comma) would
make the lines longer than 79 characters, causing them to wrap and
making it rather difficult to map the appearance of the code to the
appearance of the output. This, in turn, would make the display
difficult to edit, which was annoying when I was tinkering it
to fit in 2480 screen cells.
A more sophisticated method using the string-pasting behavior of
the ANSI C
preprocessor collided with a variant of the same problem.
Essentially, any way of inlining the usage screen explicitly would
involve punctuation at start and end of line that there's no room
for.[98] And copying the table to the screen from a
file at runtime seemed like a fragile expedient; after all, the file
could get lost.
Here's the solution. The source distribution contains a file
that just contains the usage screen, exactly as listed above and named
splashscreen. The C source contains the following
function:
void
showHelp(FILE *out, char *progname)
{
fprintf(out,"Usage: %s [-dxohv] [-t] [char-alias...]\n", progname);
#include "splashscreen.h"
exit(0);
}
And splashscreen.h is generated by a makefile production:
splashscreen.h: splashscreen
sed <splashscreen >splashscreen.h \
-e 's/\\/\\\\/g' -e 's/"/\\"/' -e 's/.*/puts("&");/'
So when the program is built, the
splashscreen file is automatically massaged into
a series of output function calls, which are then included by the C
preprocessor in the right function.
By generating the code from data, we get to keep the editable
version of the usage screen identical to its display appearance. This
promotes
transparency. Furthermore, we could modify the usage
screen at will without touching the C code at all, and the right thing
would automatically happen on the next build.
For similar reasons, the initializer that holds the name synonym
strings is also generated via a sed script
in the makefile, from a file called nametable in
the ascii source distribution. Most of
nametable is simply copied into the C
initializer. But the generation process would make it easy to adapt
this tool for other 8-bit character sets such as the ISO-8859 series
(Latin-1 and friends).
This is an almost trivial example, but it nevertheless
illustrates the advantages of even simple and ad-hoc code
generation. Similar techniques could be applied to larger
programs with correspondingly greater benefits.