|
Unix Programming - Application Protocol Metaformats - HTTP as a Universal Application Protocol
HTTP as a Universal Application Protocol
Ever since the World Wide Web reached critical mass around 1993,
application protocol designers have shown an increasing tendency to
layer their special-purpose protocols on top of HTTP, using web servers
as generic service platforms.
This is a viable option because, at the transaction layer, HTTP
is very simple and general. An HTTP request is a message in an
RFC-822/MIME-like
format; typically, the
headers contain identification and authentication information, and the
first line is a method call on some resource specified by a Universal
Resource Indicator (URI). The most important methods are GET (fetch
the resource), PUT (modify the resource) and POST (ship data to a form
or back-end process). The most important form of URI is a URL or
Uniform Resource Locator, which identifies the resource by service
type, host name, and a location on the host. An HTTP response is
simply an RFC-822/MIME message and can contain arbitrary content to be
interpreted by the client.
Web servers handle the transport and request-multiplexing layers
of HTTP, as well as standard service types like http and ftp. It is
relatively easy to write web server plugins that will handle custom
service types, and to dispatch on other elements of the URI format.
Besides avoiding a lot of lower-level details, this method means
the application protocol will tunnel through the standard HTTP service
port and not need a
TCP/IP service port of
its own. This can be a distinct advantage; most firewalls leave port 80
open, but trying to punch another hole through can be fraught with
both technical and political difficulties.
With this advantage comes a risk. It means that your web server
and its plugins grow more complex, and cracks in any of that code can
have large security implications. It may become more difficult to
isolate and shut down problem services. The usual tradeoffs between
security and convenience apply.
RFC 3205, On the Use of HTTP As a
Substrate,[56] has good design advice for anyone
considering using HTTP as the underlayer of an application
protocol, including a summary of the tradeoffs and problems involved.
Case Study: The CDDB/freedb.org Database
Audio CDs consist of a sequence of music tracks in a digital
format called CDDA-WAV. They were designed to be played by very simple
consumer-electronics devices a few years before general-purpose
computers developed enough raw speed and sound capability to decode
them on the fly. Because of this, there is no provision in the
format for even simple metainformation such as the album and track
titles. But modern computer-hosted CD players want this information so
the user can assemble and edit play lists.
Enter the Internet. There are (at least two) repositories that
provide a mapping between a hash code computed from the track-length
table on a CD and artist/album-title/track-title records. The
original was cddb.org, but
another site called freedb.org which is probably now more
complete and widely used. Both sites rely on their users for the
enormous task of keeping the database current as new CDs come out;
freedb.org arose from a
developer revolt after CDDB elected to take all that user-contributed
information proprietary .
Queries to these services could have been implemented as a
custom application protocol on top of TCP/IP, but that would have
required steps such as getting a new TCP/IP port number assigned and
fighting to get a hole for it punched through thousands of firewalls.
Instead, the service is implemented over HTTP as a simple CGI query
(as if the CD's hash code had been supplied by a user filling in a Web
form).
This choice makes all the existing infrastructure of HTTP and
Web-access libraries in various programming languages available to
support programs for querying and updating this database. As a
result, adding such support to a software CD player is nearly trivial,
and effectively every software CD player knows how to use them.
Case Study: Internet Printing Protocol
Internet Printing Protocol (IPP) is a successful, widely implemented
standard for the control of network-accessible printers. Pointers to
RFCs, implementations, and much other related material are available
at the IETF's Printer Working
Group site.
IPP uses HTTP 1.1 as a transport layer. All IPP requests are
passed via an HTTP POST method call; responses are ordinary HTTP
responses. (Section 4.2 of RFC2568, Rationale for the Structure
of the Model and Protocol for the Internet Printing
Protocol, does an excellent job of explaining this
choice; it repays study by anyone considering writing a new application
protocol.)
From the software side, HTTP 1.1 is widely deployed. It already
solves many of the transport-level problems that would otherwise
distract protocol developers and implementers from concentrating on
the domain semantics of printing. It is cleanly extensible, so there
is room for IPP to grow. The CGI programming model for handling the
POST requests is well understood and development tools are widely
available.
Most network-aware printers already embed a web server, because
that's the natural way to make the status of the printer remotely
queryable by human beings. Thus, the incremental cost of adding IPP
service to the printer firmware is not large. (This is an argument
that could be applied to a remarkably wide range of other
network-aware hardware, including vending machines and coffee makers[57]
and hot tubs!)
About the only serious drawback of layering IPP over HTTP is
that the protocol is completely driven by client requests. Thus there
is no space in the model for printers to ship asynchronous alert
messages back to clients. (However, smarter clients could run a
trivial HTTP server to receive such alerts formatted as HTTP requests
from the printer.)
[an error occurred while processing this directive]
|