Unix Programming - Language Evaluations

C

Despite the memory-management problem, there are some application niches for which C is still king . Programs that require maximum speed, have real-time requirements, or are tightly coupled to the OS kernel are good candidates for C.

Programs that must be portable across multiple operating systems may also be good candidates for C. Some of the alternatives to C that we shall discuss below are, however, increasingly penetrating major non-Unix operating systems; in the near future, portability may be less a distinguishing advantage of C.

Sometimes the leverage to be gained from existing programs like parser generators or GUI builders that generate C code is so great that it justifies C coding of the rest of a small application.

And, of course, C proved indispensable to the developers of all its alternatives. Dig down through enough implementation layers under any of the other languages surveyed here and you will find a core implemented in pure, portable C. These languages inherit many of the advantages of C.

Under modern conditions, it's perhaps best to think of C as a high-level assembler for the Unix virtual machine (recall the discussion of the success of C as a case study in Chapter4). C standards have exported many of the facilities of this virtual machine, such as the standard I/O library, to other operating systems. C is where you go when you want to get as close as possible to the bare metal but stay portable.

One good reason to learn C, even if your programming needs are satisfied by a higher-level language, is that it can help you learn to think at hardware-architecture level. The best reference and tutorial on C for people who are already programmers is still The C Programming Language [Kernighan-Ritchie].

Porting C code between Unix variants is almost always possible and usually easy, but specific areas of variation (like signals and process control) can be tricky to get right. We highlight some of these issues in Chapter17. Differing C bindings on other operating systems can of course cause C portability problems, although Windows NT at least theoretically supports an ANSI/POSIX-compliant C API.

High-quality C compilers are available as open-source software over the Internet; the best-known and most widely used is the Free Software Foundation's GNU C compiler (part of GCC, the GNU Compiler Collection), which has become the native C of all open-source Unix systems and many even in the closed-source world. GCC ports are even available for Microsoft's family of operating systems. GCC sources are available at the FSF's FTP site.

Summing up: C's best side is resource efficiency and closeness to the machine. Its worst side is that programming in it is a resource-management hell.

C Case Study: fetchmail

The best case study for C is the Unix kernel itself, for which a language that naturally supports hardware-level operations is actually a strong advantage. But fetchmail is a good example of the kind of user-land utility that is still best coded in C.

fetchmail does only the simplest kind of dynamic-memory management; its only complex data structure is a singly-linked list of per-mailserver control blocks built just once, at startup time, and changed only in fairly trivial ways afterwards. This substantially erodes the case against using C by sidestepping C's greatest weakness.

On the other hand, these control blocks are fairly complex (including all of string, flag, and numeric data) and would be difficult to handle as coherent fast-access objects in an implementation language without an equivalent of the C struct feature. Most of the alternatives to C are weaker than C in this respect (Python and Java being notable exceptions).

Finally, fetchmail requires the ability to parse a fairly complex specification syntax for per-mail-server control information. In the Unix world this sort of thing is classically handled by using C code generators that grind out source code for a tokenizer and grammar parser from declarative specifications. The existence of yacc and lex was a point in favor of C.

fetchmail might reasonably have been coded in Python , albeit with possibly significant loss of performance. Its size and data-structure complexity would have excluded shell and Tcl right off and strongly counterindicated Perl , and the application domain is outside the natural scope of Emacs Lisp . A Java implementation wouldn't have been an unreasonable path, but Java's object-oriented style and garbage collection would have offered little purchase on fetchmail's specific problems over what C already yields. Nor could C++ have done much to simplify the relatively simple internal logic of fetchmail.

However, the real reason fetchmail is a C program is that it evolved by gradual mutation from an ancestor already written in C. The existing implementation has been extensively tested on many different platforms and against many odd and quirky servers. Carrying all that implicit knowledge through to a re-implementation in a different language would be messy and difficult. Furthermore, fetchmail depends on imported code for functions (like NTLM authentication) that don't seem to be available above C level.

fetchmail's interactive configurator, which did not have a C legacy problem, is written in Python; we'll discuss that case along with that language.