Chapter7.Multiprogramming
Separating Processes toSeparate Function
If we believe in data structures, we must believe in independent
(hence simultaneous) processing. For why else would we collect items
within a structure? Why do we tolerate languages that give us the one
without the other?
--
Alan Perlis
Epigrams in Programming, in ACM SIGPLAN (Vol 17 #9, 1982)
The most characteristic program-modularization technique of Unix
is splitting large programs into multiple cooperating processes. This
has usually been called ‘multiprocessing’ in the Unix world,
but in this book we revive the older term
‘multiprogramming’ to avoid confusion with
multiprocessor hardware implementations.
Multiprogramming is a particularly murky area of design, one in
which there are few guidelines to good practice. Many programmers
with excellent judgment about how to break up code into subroutines
nevertheless wind up writing whole applications as monster
single-process monoliths that founder on their own internal
complexity.
The Unix style of design applies the do-one-thing-well approach
at the level of cooperating programs as well as cooperating routines
within a program, emphasizing small programs connected by well-defined
interprocess communication or by shared files. Accordingly, the Unix operating
system encourages us to break our programs into simpler subprocesses,
and to concentrate on the interfaces between these subprocesses. It
does this in at least three fundamental ways:
-
by making process-spawning cheap;
-
by providing methods (shellouts, I/O redirection, pipes,
message-passing, and sockets) that make it relatively easy for
processes to communicate;
-
by encouraging the use of simple, transparent, textual
data formats that can be passed through
pipes and sockets.
Inexpensive process-spawning and easy process control are
critical enablers for the Unix style of programming. On an operating
system such as VAX VMS, where starting processes is expensive
and slow and requires special privileges, one must build monster
monoliths because one has no choice. Fortunately the trend in the Unix
family has been toward lower
fork(2)
overhead rather than higher. Linux, in particular, is famously efficient
this way, with a process-spawn faster than thread-spawning on many
other operating systems.[65]
Historically, many Unix programmers have been encouraged to
think in terms of multiple cooperating processes by experience with
shell programming. Shell makes it relatively easy to set up groups of
multiple processes connected by
pipes, running either
in background or foreground or a mix of the two.
In the remainder of this chapter, we'll look at the
implications of cheap process-spawning and discuss how and when to
apply pipes, sockets, and other interprocess communication (IPC)
methods to partition your design into cooperating processes. (In the
next chapter, we'll apply the same separation-of-functions philosophy
to interface design.)
While the benefit of breaking programs up into cooperating
processes is a reduction in global complexity, the cost is that we
have to pay more attention to the design of the protocols which are
used to pass information and commands between processes. (In software
systems of all kinds, bugs collect at interfaces.)
In Chapter5 we
looked at the lower level of this design problem — how to lay
out application protocols that are transparent, flexible and
extensible. But there is a second, higher level to the problem which
we blithely ignored. That is the problem of designing state machines
for each side of the communication.
It is not hard to apply good style to the syntax of application
protocols, given models like SMTP or BEEP or XML-RPC. The real
challenge is not protocol syntax but protocol
logic
—designing a protocol that is both
sufficiently expressive and deadlock-free. Almost as importantly, the
protocol has to be
seen
to be expressive and
deadlock-free; human beings attempting to model the behavior of the
communicating programs in their heads and verify its correctness must
be able to do so.
In our discussion, therefore, we will focus on the kinds of
protocol logic one naturally uses with each kind of interprocess
communication.