Unix Programming - Problems in the Design of Unix

Problems in the Design of Unix

Plan 9 cleans up Unix, but only really adds one new concept (private namespaces) to its basic set of design ideas. But are there serious problems with those basic design ideas? In Chapter1 we touched on several issues that Unix arguably got wrong. Now that the open-source movement has put the design future of Unix back in the hands of programmers and technical people, these are no longer decisions we have to live with forever. We'll reexamine them in order to get a better handle on how Unix might evolve in the future.

A Unix File Is Just a Big Bag of Bytes

A Unix file is just a big bag of bytes, with no other attributes. In particular, there is no capability to store information about the file type or a pointer to an associated application program outside the file's actual data.

More generally, everything is a byte stream ; even hardware devices are byte streams. This metaphor was a tremendous success of early Unix, and a real advance over a world in which (for example) compiled programs could not produce output that could be fed back to the compiler. Pipes and shell programming sprang from this metaphor.

But Unix's byte-stream metaphor is so central that Unix has trouble integrating software objects with operations that don't fit neatly into the byte stream or file repertoire of operations (create, open, read, write, delete). This is especially a problem for GUI objects such as icons, windows, and ‘live’ documents. Within a classical Unix model of the world, the only way to extend the everything-is-a-byte-stream metaphor is through ioctl calls, a notoriously ugly collection of back doors into kernel space.

Fans of the Macintosh family of operating systems tend to be vociferous about this. They advocate a model in which a single filename may have both data and resource ‘forks’, the data fork corresponding to the Unix byte stream and the resource fork being a collection of name/value pairs. Unix partisans prefer approaches that make file data self-describing so that effectively the same sort of metadata is stored within the file.

The problem with the Unix approach is that every program that writes the file has to know about it. Thus, for example, if we want the file to carry type information inside it, every tool that touches it has to take care to either preserve the type field unaltered or interpret and then rewrite it. While this would be theoretically possible to arrange, in practice it would be far too fragile.

On the other hand, supporting file attributes raises awkward questions about which file operations should preserve them. It's clear that a copy of a named file to another name should copy the source file's attributes as well as its data — but suppose we cat(1) the file, redirecting the output of cat(1) to a new name?

The answer to this question depends on whether the attributes are actually properties of filenames or are in some magical way bundled with the file's data as a sort of invisible preamble or postamble. Then the question becomes: Which operations make the properties visible?

Xerox PARC file-system designs grappled with this problem as far back as the 1970s. They had an ‘open serialized’ call that returned a byte stream containing both attributes and content. If applied to a directory, it returned a serialization of the directory's attributes plus the serialization of all the files in it. It is not clear that this approach has ever been bettered.

Linux 2.5 already supports attaching arbitrary name/value pairs as properties of a filename, but at time of writing this capability is not yet much used by applications. Recent versions of Solaris have a roughly equivalent feature.

[an error occurred while processing this directive]

The Art of Unix Programming
Prev	Home	Next