Unix Programming - Compactness and Orthogonality

Compactness and Orthogonality

Code is not the only sort of thing with an optimal chunk size. Languages and APIs (such as sets of library or system calls) run up against the same sorts of human cognitive constraints that produce Hatton's U-curve.

Accordingly, Unix programmers have learned to think very hard about two other properties when designing APIs, command sets, protocols, and other ways to make computers do tricks: compactness and orthogonality.

Compactness

Compactness is the property that a design can fit inside a human being's head. A good practical test for compactness is this: Does an experienced user normally need a manual? If not, then the design (or at least the subset of it that covers normal use) is compact.

Compact software tools have all the virtues of physical tools that fit well in the hand. They feel pleasant to use, they don't obtrude themselves between your mind and your work, they make you more productive — and they are much less likely than unwieldy tools to turn in your hand and injure you.

Compact is not equivalent to ‘weak’. A design can have a great deal of power and flexibility and still be compact if it is built on abstractions that are easy to think about and fit together well. Nor is compact equivalent to ‘easily learned’; some compact designs are quite difficult to understand until you have mastered an underlying conceptual model that is tricky, at which point your view of the world changes and compact becomes simple. For a lot of people, the Lisp language is a classic example of this .

	Nor does compact mean ‘small’. If a well-designed system is predictable and ‘obvious’ to the experienced user, it might have quite a few pieces.
-- Ken Arnold

Very few software designs are compact in an absolute sense, but many are compact in a slightly looser sense of the term. They have a compact working set, a subset of capabilities that suffices for 80% or more of what expert users normally do with them. Practically speaking, such designs normally need a reference card or cheat sheet but not a manual. We'll call such designs semi-compact, as opposed to strictly compact.

The concept is perhaps best illustrated by examples. The Unix system call API is semi-compact, but the standard C library is not compact in any sense. While Unix programmers easily keep a subset of the system calls sufficient for most applications programming (file system operations, signals, and process control) in their heads, the C library on modern Unixes includes many hundreds of entry points, e.g., mathematical functions, that won't all fit inside a single programmer's cranium.

The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information [Miller] is one of the foundation papers in cognitive psychology (and, incidentally, the specific reason that U.S. local telephone numbers have seven digits). It showed that the number of discrete items of information human beings can hold in short-term memory is seven, plus or minus two. This gives us a good rule of thumb for evaluating the compactness of APIs: Does a programmer have to remember more than seven entry points? Anything larger than this is unlikely to be strictly compact.

Among Unix tools, make(1) is compact; autoconf(1) and automake(1) are not. Among markup languages, HTML is semi-compact, but DocBook (a documentation markup language we shall discuss in Chapter18) is not. The man(7) macros are compact, but troff(1) markup is not.

Among general-purpose programming languages, C and Python are semi-compact; Perl , Java , Emacs Lisp , and shell are not (especially since serious shell programming requires you to know half-a-dozen other tools like sed(1) and awk(1)). C++ is anti-compact — the language's designer has admitted that he doesn't expect any one programmer to ever understand it all.

Some designs that are not compact have enough internal redundancy of features that individual programmers end up carving out compact dialects sufficient for that 80% of common tasks by choosing a working subset of the language. Perl has this kind of pseudo-compactness, for example. Such designs have a built-in trap; when two programmers try to communicate about a project, they may find that differences in their working subsets are a significant barrier to understanding and modifying the code.

Noncompact designs are not automatically doomed or bad, however. Some problem domains are simply too complex for a compact design to span them. Sometimes it's necessary to trade away compactness for some other virtue, like raw power and range. Troff markup is a good example of this. So is the BSD sockets API. The purpose of emphasizing compactness as a virtue is not to condition you to treat compactness as an absolute requirement, but to teach you to do what Unix programmers do: value compactness properly, design for it whenever possible, and not throw it away casually.

[an error occurred while processing this directive]

The Art of Unix Programming
Prev	Home	Next