Credit: Guido van Rossum, creator of Python
Network programming is one of my
favorite Python applications. I wrote or started most of the network
modules in the Python standard library, including the
socket
and select
extension
modules and most of the protocol client modules (such as
ftplib
), which set an example. I also wrote a
popular server framework module, SocketServer
, and
two web browsers in Python, the first predating Mosaic. Need I say
more?
Python’s roots lie in a distributed operating system, Amoeba, which I helped design and implement in the late ’80s. Python was originally intended to be the scripting language for Amoeba, since it turned out that the Unix shell, while ported to Amoeba, wasn’t very useful for writing Amoeba system-administration scripts. Of course, I designed Python to be platform-independent from the start. Once Python was ported from Amoeba to Unix, I taught myself BSD socket programming by wrapping the socket primitives in a Python extension module and then experimenting with them using Python; this was one of the first extension modules.
This approach proved to be a great early testimony of
Python’s strengths. Writing socket code in C is
tedious: the code necessary to do error checking on every call
quickly overtakes the logic of the program. Quick: in which order
should a server call accept
,
bind
, connect
, and
listen
? This is remarkably difficult to find out
if all you have is a set of Unix manpages. In Python, you
don’t have to write separate error-handling code for
each call, making the logic of the code stand out much clearer. You
can also learn about sockets by experimenting in an interactive
Python shell, where misconceptions about the proper order of calls
and the argument values that each call requires are cleared up
quickly through Python’s immediate error messages.
Python has come a long way since those first days, and now few
applications use the socket
module directly; most
use much higher-level modules such as urllib
or
smtplib
. The examples in this chapter are a varied
bunch: there are some that construct and send complex email messages,
while others dig in the low-level bowels of the network
implementation on a specific platform. My favorite is Recipe 10.13, which discusses
PyHeartBeat
: it’s useful, it uses
the socket
module, and it’s
simple enough to be a good educational example.
The socket
module itself is still the foundation of
all network operations in Python. It’s a plain
transliteration of the socket APIs—first introduced in BSD Unix
and now widespread on all platforms—into the object-oriented
paradigm. You create socket objects by calling the
socket.socket
factory function, then calling
methods on these objects to perform typical low-level network
operations. Of course, you don’t have to worry about
allocating and freeing memory for buffers and the like—Python
handles that for you automatically. You express IP addresses as
(host,port)
pairs, in which
host
is a string in either dotted-quad
('1.2.3.4'
) or domain-name
('www.python.org'
) notation. As you can see, even
low-level modules in Python aren’t as low-level as
all that.
But despite the various conveniences, the socket
module still exposes the actual underlying functionality of your
operating system’s network sockets. If
you’re at all familiar with them,
you’ll quickly get the hang of
Python’s socket
module, using
Python’s own Library Reference.
You’ll then be able to play with sockets
interactively in Python to become a socket expert, if that is what
you need. The classic work on this subject is UNIX Network Programming, Volume 1: Networking APIs - Sockets and XTI, Second Edition, by W. Richard Stevens (Prentice-Hall), and
it is highly recommended. For many practical uses, however,
higher-level modules will serve you better.
The Internet uses a sometimes dazzling
variety of protocols and formats, and Python’s
standard library supports many of them.
In Python’s standard library, you will find dozens
of modules dedicated to supporting specific Internet protocols (such
as smtplib
to support the SMTP protocol to send
mail, nntplib
to support the NNTP protocol to send
and receive Network News, and so on). In addition,
you’ll find about as many modules that support
specific Internet formats (such as htmllib
to
parse HTML data, the email
package to parse and
compose various formats related to email—including attachments
and encoding—and so on).
Clearly, I cannot even come close to doing justice to the powerful
array of tools mentioned in this introduction, nor will you find all
of these modules and packages used in this chapter, nor in this book,
nor in most programming shops. You may never need to write any
program that deals with Network News, for example, so you will not
need to study nntplib
. But it is reassuring to
know it’s there (part of the
“batteries included” approach of
the Python standard library).
Two higher-level modules that stand out from the crowd, however, are
urllib
and urllib2
. Each
can deal with several protocols through the magic of URLs—those
now-familiar strings, such as http://www.python.org/index.html, that
identify a protocol (such as http), a host and
port (such as www.python.org, port 80 being the
default here), and a specific resource at that address (such as
/index.html). urllib
is
rather simple to use, but urllib2
is more powerful
and extensible. HTTP is the most popular protocol for URLs, but these
modules also support several others, such as FTP and Gopher. In many
cases, you’ll be able to use these modules to write
typical client-side scripts that interact with any of the supported
protocols much quicker and with less effort than it might take with
the various protocol-specific modules.
To illustrate, I’d like to conclude with a cookbook
example of my own. It’s similar to Recipe 10.7, but rather than a program fragment,
it’s a little script. I call it
wget.py
because it does everything for which I’ve ever
needed wget. (In fact, I wrote it on a system
where wget wasn’t installed but
Python was; writing wget.py
was a more effective
use of my time than downloading and installing the real thing.)
import sys, urllib def reporthook(*a): print a for url in sys.argv[1:]: i = url.rfind('/') file = url[i+1:] print url, "->", file urllib.urlretrieve(url, file, reporthook)
Pass it one or more URLs as command-line arguments; it retrieves those into local files whose names match the last components of the URLs. It also prints progress information of the form:
(block number, block size, total size)
Obviously, it’s easy to improve on this; but it’s only seven lines, it’s readable, and it works—and that’s what’s so cool about Python.
Another cool thing about Python is that you can incrementally improve
a program like this, and after it’s grown by two or
three orders of magnitude, it’s still readable, and
it still works! To see what this particular example might evolve
into, check out Tools/webchecker/websucker.py
in
the Python source distribution. Enjoy!
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.