BUY THIS BOOK
Add to Cart

Print Book $29.95


Add to Cart

Print+PDF $38.94

Add to Cart

PDF $23.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £20.95

What is this?

Looking to Reprint or License this content?


Twisted Network Programming Essentials
Twisted Network Programming Essentials By Abe Fettig
October 2005
Pages: 236

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Getting Started
Before you can start developing applications using Twisted, you'll need to download and install Twisted and its related packages. This chapter walks you through the installation process on various operating systems. It also shows you how to add the Twisted utilities to your path, familiarize yourself with the Twisted documentation, and get answers to your questions from the Twisted community.
First things first: you need to get Twisted installed on your computer. Downloads and instructions for installing Twisted on various operating systems can be found at http://twistedmatrix.com/projects/core/. To enable additional functionality in Twisted, you'll have to install a couple of optional packages as well.
Begin by downloading the latest Twisted release from http://www.twistedmatrix.com. Then install PyOpenSSL (a Python wrapper of the popular OpenSSL library), which Twisted uses to make encrypted Secure Socket Layer (SSL) connections. Finally, install PyCrypto , a package containing Python implementations of encryption algorithms used by the Secure SHell (SSH). Locations for these downloads are provided in each of the following platform-specific sections.
You don't need to install PyOpenSSL or PyCrypto in order to use Twisted. Without these packages installed, you won't be able to use Twisted's SSL and SSH features, but everything else will still work.

Section 1.1.1.1: Windows

Go to http://twistedmatrix.com/projects/core/. Download the Twisted Windows "Sumo" installer for Python 2.4 (or your Python version). The Sumo binary includes the Twisted core, as well as a number of extra modules to support specific groups of protocols like mail and web. You'll need the full Sumo version of Twisted installed to run most of the examples in this book. Then go to http://twistedmatrix.com/products/download and find the section labeled "Twisted Dependencies for Windows." There you'll find links to installers for the latest versions of PyOpenSSL and PyCrypto.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing Twisted
First things first: you need to get Twisted installed on your computer. Downloads and instructions for installing Twisted on various operating systems can be found at http://twistedmatrix.com/projects/core/. To enable additional functionality in Twisted, you'll have to install a couple of optional packages as well.
Begin by downloading the latest Twisted release from http://www.twistedmatrix.com. Then install PyOpenSSL (a Python wrapper of the popular OpenSSL library), which Twisted uses to make encrypted Secure Socket Layer (SSL) connections. Finally, install PyCrypto , a package containing Python implementations of encryption algorithms used by the Secure SHell (SSH). Locations for these downloads are provided in each of the following platform-specific sections.
You don't need to install PyOpenSSL or PyCrypto in order to use Twisted. Without these packages installed, you won't be able to use Twisted's SSL and SSH features, but everything else will still work.

Section 1.1.1.1: Windows

Go to http://twistedmatrix.com/projects/core/. Download the Twisted Windows "Sumo" installer for Python 2.4 (or your Python version). The Sumo binary includes the Twisted core, as well as a number of extra modules to support specific groups of protocols like mail and web. You'll need the full Sumo version of Twisted installed to run most of the examples in this book. Then go to http://twistedmatrix.com/products/download and find the section labeled "Twisted Dependencies for Windows." There you'll find links to installers for the latest versions of PyOpenSSL and PyCrypto.
It's possible that some of these packages might move to different pages as the Twisted web site grows and is restructured in the future. If one if these links doesn't work for you, try starting from the Twisted home page at http://twistedmatrix.com.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing from Source Files
If you're on an operating system for which no Twisted binary packages are available, you'll need to install from source. Don't worry, though; as source installs go, Python packages are among the easiest you'll find.
First, download the full "Sumo" source package for Twisted (choosing the version with documentation) from http://twistedmatrix.com/projects/core/. The Sumo package is the core of Twisted, plus a number of bundled modules from other projects developed under the Twisted umbrella; you'll need the modules in the Sumo package to run most of the examples in this book. Once you've downloaded the package, extract it to a working directory:
    $ tar -xjvf ~/downloads/TwistedSumo-2005-03-22.tar.bz2
    TwistedSumo-2005-03-22/
    TwistedSumo-2005-03-22/bin/
    ...
    TwistedSumo-2005-03-22/README
    TwistedSumo-2005-03-22/LICENSE
    TwistedSumo-2005-03-22/setup.py
    TwistedSumo-2005-03-22/ZopeInterface-3.0.1.tgz
Next, enter the TwistedSumo-version directory. Twisted depends on the zope.interface package, which is bundled in the Twisted Sumo distribution. Unzip the ZopeInterface tarball:
    $ tar -xzvf ZopeInterface-3.0.1.tgz
    ZopeInterface-3.0.1/
    ZopeInterface-3.0.1/Support/
    ZopeInterface-3.0.1/Support/zpkgsetup/
    ZopeInterface-3.0.1/Support/zpkgsetup/publication.py
    ...
    ZopeInterface-3.0.1/setup.py
    ZopeInterface-3.0.1/setup.cfg
    ZopeInterface-3.0.1/MANIFEST
Enter the ZopeInterface-<version> directory, and run the command python setup.py install. This command will build and install the zope.interface package in your python installation's lib/site-packages/twisted directory. You'll need to have administrative/root permissions to do this, so use su or sudo to increase your permission level if necessary:
    $ cd ZopeInterface-3.0.1
    $ python setup.py install
    running install
    running build
    running build_py
    running build_ext
    building 'zope.interface._zope_interface_coptimizations' extension
    ...
    running install_lib
    copying build/lib.linux-i686-2.4/zope/interface/_zope_interface_coptimizations.so ->
    /usr/lib/python2.4/site-packages/zope/interface
    writing byte-compilation script '/tmp/tmpdY9dA9.py'
    /usr/bin/python -O /tmp/tmpdY9dA9.py
    removing /tmp/tmpdY9dA9.py
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Adding Twisted Utilities to Your Path
Twisted includes a number of scripts and utilities that you'll need to use. For convenience, you should make sure these are available in your path.
Typically, all you have to do is add Twisted's utility directory to your operating system's command search path. Follow the specific instructions given in the following section for your platform.

Section 1.3.1.1: Windows

Twisted's utilities will be installed in the Python scripts directory (typically in a location such as c:\Python23\scripts). Twisted includes a helpful Programs menu entry that launches a Windows command prompt with the Python scripts directory added to %PATH%. It's located under Programs Twisted (Python [version]) Twisted Command Prompt. Use this menu entry to launch your command prompt when you need to run the Twisted utilities, or edit your %PATH% to include the scripts directory.

Section 1.3.1.2: Linux

Twisted's utilities will be installed in the same directory as your python binary (probably /usr/bin or /usr/local/bin), so you shouldn't need to make any changes to your $PATH.

Section 1.3.1.3: Mac OS X

If you're using the version of Python included with Mac OS X 2.3 "Jaguar" or later, Twisted's utilities will be installed under /System/Library/Frameworks/Python.framework/Versions/Current/bin. Add this directory to your $PATH :
    $ set PATH=$PATH:/System/Library/Frameworks/Python.framework/Versions/Current/bin
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using the Twisted Documentation
Twisted includes a few different types of documentation: extensive API documentation, HOWTOs, a tutorial, and sample code. It's a good idea to familiarize yourself with this documentation now, so that you'll be able to refer to it during the development process.
Documentation for Twisted is available online on the Twisted web site. A complete API reference can be found at http://twistedmatrix.com/documents/current/api. You'll probably refer to this documentation many times to see which classes a module contains or to see the list of arguments for a specific function. The pages in the API documentation are automatically generated from the source code using lore, a custom documentation tool developed as part of Twisted.
Twisted is developed as a set of subprojects, and each project has additional documentation in its section of the Twisted site. For example, documentation on the core modules is at http://twistedmatrix.com/projects/core/documentation, and documentation on web modules is at http://twistedmatrix.com/projects/web/documentation. There are links to the full list of projects and documentation on the home page.
Within each project's documentation, you'll find the following types of information:
HOWTOs
These documents describe specific features of Twisted and how to use them. The HOWTOs don't cover every part of Twisted, but they can provide a helpful starting point for certain tasks. Included in the HOWTOs is a tutorial called "Twisted From Scratch," which shows how an application can be developed in Twisted, extended to take advantage of some advanced features, and then fully integrated with the Twisted utilities.
Examples
These are examples of short and specific Twisted programs. Like the HOWTOs, these aren't comprehensive but can be an excellent resource when you need a working example of a certain feature.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding Answers to Your Questions
Even with this book, and the Twisted documentation, you'll eventually run into a question whose answer you can't figure out on your own. When that happens, it's time to get help from the Twisted community.
Figure 1-1: Using pydoc to view API documentation
There are a few excellent community resources you can look to for help. First, there are the mailing lists . The twisted-python list is for general discussion of Twisted. The twisted-web list is dedicated to discussion of web applications. It's good etiquette to use the proper list; if you ask web-related questions on the twisted-python list, you'll probably be asked to move the discussion to twisted-web. You can sign up for the twisted-python and twisted-web mailing lists at http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python and http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web.
Second, you can talk with Twisted users and developers in the #twisted and #twisted.web IRC channels on the freenode network (see http://freenode.net for a list of servers). These channels feature lively, and often funny, discussion, and give you the unique opportunity to ask questions directly to members of the Twisted development team. Keep in mind, though, that such real-time support is a privilege, not a right. Be polite, and understand that developers might not always have time to answer your questions right at that moment. If you don't get an immediate answer on IRC, try sending a message to the appropriate mailing list. This approach will give the question to a wider audience, and let people answer when they have more time.
A final resource available to the Twisted community is Planet Twisted . Located at http://planet.twistedmatrix.com, this web site aggregates weblog posts made by members of the Twisted development team. It's an excellent way to keep track of what's going on with Twisted development, as well as to get to know the personalities of the Twisted team. Planet Twisted also provides an RSS feed at
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Building Simple Clients and Servers
To develop with Twisted, you'll need to learn how to use several new classes and objects. These classes are at the core of Twisted, and you'll use them over and over in your applications. They also represent the steepest part of the Twisted learning curve. Understand how to use them, and the rest of Twisted will be easy; otherwise, you'll struggle (or write lots of unnecessary code).
This chapter shows how to write simple clients and servers with Twisted. Along the way, it introduces Twisted's basic classes, explains how they work, and demonstrates how to use them.
Twisted is an event-driven framework. This means that instead of having the program's functions called in a sequence specified by the program's logic, they are called in response to external actions, or events. For example, a GUI program might have code for responding to the "button pressed" event. The designer of the program can't be sure exactly when such an event will occur; but she writes a function to respond to this event whenever it does happen. Such a function is known as an event handler .
Every event-driven framework includes a special function called an event loop . Once started, an event loop runs indefinitely. While it's running, it waits for events. When an event occurs, the event loop triggers the appropriate event handler function.
Using an event loop requires a different mindset on the part of the programmer than traditional sequential programming. Once you start the event loop, you no longer have the ability to directly instruct your program what to do; it can perform actions only in response to events. Therefore, you need to think in terms of events and event handlers when you design your program. What are the events you want your program to respond to? How do you want it to react when a given event occurs?
In Twisted, there's a special object that implements the event loop. This object is called the reactor . You can think of the reactor as the central nervous system of a Twisted application. In addition to being responsible for the event loop, the reactor handles many other important tasks: scheduling, threading, establishing network connections, and listening for connections from other machines. To allow the reactor to do all these things, you must start its event loop, handing off control of your program.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Starting the Twisted Event Loop
Twisted is an event-driven framework. This means that instead of having the program's functions called in a sequence specified by the program's logic, they are called in response to external actions, or events. For example, a GUI program might have code for responding to the "button pressed" event. The designer of the program can't be sure exactly when such an event will occur; but she writes a function to respond to this event whenever it does happen. Such a function is known as an event handler .
Every event-driven framework includes a special function called an event loop . Once started, an event loop runs indefinitely. While it's running, it waits for events. When an event occurs, the event loop triggers the appropriate event handler function.
Using an event loop requires a different mindset on the part of the programmer than traditional sequential programming. Once you start the event loop, you no longer have the ability to directly instruct your program what to do; it can perform actions only in response to events. Therefore, you need to think in terms of events and event handlers when you design your program. What are the events you want your program to respond to? How do you want it to react when a given event occurs?
In Twisted, there's a special object that implements the event loop. This object is called the reactor . You can think of the reactor as the central nervous system of a Twisted application. In addition to being responsible for the event loop, the reactor handles many other important tasks: scheduling, threading, establishing network connections, and listening for connections from other machines. To allow the reactor to do all these things, you must start its event loop, handing off control of your program.
Starting the reactor is easy. Import the reactor object from the twisted.internet module. Then call reactor.run() to start the reactor's event loop. Example 2-1 shows all the code you need.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Working with Asynchronous Results
After the reactor, Deferreds are probably the most important objects used in Twisted. You'll work with Deferreds frequently in Twisted applications, so it's essential to understand how they work. Deferreds can be a little confusing at first, but their purpose is simple: to keep track of an asynchronous action, and to get the result when the action is completed.
Deferreds can be illustrated this way: perhaps you've had the experience of going to one of those restaurants where, if there's going to be a wait for a table, you're given a little pager that will buzz when your table is ready. Having the pager is nice, because it gives you freedom to do other things while you're waiting for your table, instead of standing around the front of the restaurant feeling bored. You can take a walk outside, or even go next door and do some shopping. When a table (finally!) becomes available, the pager buzzes, and you head back inside the restaurant to be seated.
A Deferred is like that pager. It gives your program a way of finding out when some asynchronous task is finished, which frees it up to do other things in the meantime. When a function returns a Deferred object, it's saying that there's going to be some delay before the final result of the function is available. To control what happens when the result does become available, you can assign event handlers to the Deferred.
When writing a function that starts an asynchronous action, return a Deferred object. When the action is competed, call the Deferred's callback method with the return value. If the action fails, call Deferred.errback with an exception. Example 2-4 shows a program that uses an asynchronous action to test connectivity to a given server and port.
When calling a function that returns a Deferred object, use the Deferred.addCallback method to assign a function to handle the results of the deferred action if it completes successfully. Use the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sending and Receiving Data
Once a TCP connection is established, it can be used for communication. A program can send data to the computer on the other end, or respond to data received from the connection.
Use a subclass of Protocol to send and receive data. Override the dataReceived method to control what happens when data is received from the connection. Use self.transport.write to send data.
Example 2-6 includes a class called DataForwardingProtocol, which takes any data received and writes it to self.output. This usage makes it possible to create a simple application, similar to the classic utility netcat, that passes any data received on standard input to a server, while printing any data received from the server to standard output.
Example 2-6. dataforward.py
from twisted.internet import stdio, reactor, protocol
from twisted.protocols import basic
import re

class DataForwardingProtocol(protocol.Protocol):
    def _ _init_ _(self):
        self.output = None
        self.normalizeNewlines = False

    def dataReceived(self, data):
        if self.normalizeNewlines:
            data = re.sub(r"(\r\n|\n)", "\r\n", data)
        if self.output:
            self.output.write(data)

class StdioProxyProtocol(DataForwardingProtocol):
    def connectionMade(self):
        inputForwarder = DataForwardingProtocol()
        inputForwarder.output = self.transport
        inputForwarder.normalizeNewlines = True
        stdioWrapper = stdio.StandardIO(inputForwarder)
        self.output = stdioWrapper
        print "Connected to server.  Press ctrl-C to close connection."

class StdioProxyFactory(protocol.ClientFactory):
    protocol = StdioProxyProtocol

    def clientConnectionLost(self, transport, reason):
        reactor.stop()

    def clientConnectionFailed(self, transport, reason):
        print reason.getErrorMessage()
        reactor.stop()

if __name__ == '_ _main_ _':
    import sys
    if not len(sys.argv) == 3:
        print "Usage: %s host port" % _ _file_ _
        sys.exit(1)

    reactor.connectTCP(sys.argv[1], int(sys.argv[2]), StdioProxyFactory())
    reactor.run()
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Accepting Connections from Clients
The previous labs in this chapter dealt with client connections, where an application initiated a connection to a remote server. However, Twisted can also be used for writing network servers, where the application waits for connections from clients. This lab will show you how to write a Twisted server that accepts connections from clients and interacts with them.
Create a Protocol object defining your server's behavior. Create a ServerFactory object using the Protocol, and pass it to reactor.listenTCP. Example 2-7 shows a simple echo server that accepts a client connection and then repeats back all client messages.
Example 2-7. echoserver.py
from twisted.internet import reactor, protocol
from twisted.protocols import basic

class EchoProtocol(basic.LineReceiver):
    def lineReceived(self, line):
        if line == 'quit':
            self.sendLine("Goodbye.")
            self.transport.loseConnection()
        else:
            self.sendLine("You said: " + line)

class EchoServerFactory(protocol.ServerFactory):
    protocol  = EchoProtocol

if __name__ == "_ _main_ _":
    port = 5001
    reactor.listenTCP(port, EchoServerFactory())
    reactor.run()
When you run this example, it will listen on port 5001, and report client connections as they are made:
    $ python echoserver.py
    Server running, press ctrl-C to stop.
    Connection from  127.0.0.1
    Connection from  127.0.0.1
In another terminal, use netcat, telnet, or the dataforward.py application from Example 2-6 to connect to the server. It will echo anything you type back to you. Type quit to close your connection:
    $ python dataforward.py localhost 5001
    Connected to server.  Press ctrl-C to close connection.
    hello
    You said: hello
    twisted is fun
    You said: twisted is fun
    quit
    Goodbye.
    $ How does that work?
Twisted servers use the same
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Web Clients
The most common way to interact with the Web is through a web browser. But as more data and services are made available on the Web, it's important to be able to write web clients that can communicate with web servers through HTTP. This chapter shows how to use the twisted.web.client module to interact with web resources, including downloading pages, using HTTP authentication, uploading files, and working with HTTP headers.
The simplest and most common task for a web client application is fetching the contents of a web page. The client connects to the server, sends an HTTP GET request, and receives an HTTP response containing the requested page.
Here's where you can begin to experience the usefulness of Twisted's built-in protocol support. The twisted.web package includes a complete HTTP implementation, saving you the work of developing the necessary Protocol and ClientFactory classes. Furthermore, it includes utility functions that allow you to make an HTTP request with a single function call. To fetch the contents of a web page, use the function twisted.web.client.getPage. Example 3-1 is a Python script called webcat.py, which fetches a URL that you specify.
Example 3-1. webcat.py
from twisted.web import client
from twisted.internet import reactor
import sys

def printPage(data):
    print data
    reactor.stop()

def printError(failure):
    print >> sys.stderr, "Error:", failure.getErrorMessage()
    reactor.stop()

if len(sys.argv) == 2:
    url = sys.argv[1]
    client.getPage(url).addCallback(
        printPage).addErrback(
        printError)
    reactor.run()
else:
    print "Usage: webcat.py <URL>"
Give webcat.py a URL as its first argument, and it will fetch and print the contents of the page:
    $ python webcat.py http://www.oreilly.com/
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
    <head>
    <title>oreilly.com -- Welcome to O'Reilly Media, Inc. -- computer books, software
    conferences, online publishing</title>
    ...
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Downloading a Web Page
The simplest and most common task for a web client application is fetching the contents of a web page. The client connects to the server, sends an HTTP GET request, and receives an HTTP response containing the requested page.
Here's where you can begin to experience the usefulness of Twisted's built-in protocol support. The twisted.web package includes a complete HTTP implementation, saving you the work of developing the necessary Protocol and ClientFactory classes. Furthermore, it includes utility functions that allow you to make an HTTP request with a single function call. To fetch the contents of a web page, use the function twisted.web.client.getPage. Example 3-1 is a Python script called webcat.py, which fetches a URL that you specify.
Example 3-1. webcat.py
from twisted.web import client
from twisted.internet import reactor
import sys

def printPage(data):
    print data
    reactor.stop()

def printError(failure):
    print >> sys.stderr, "Error:", failure.getErrorMessage()
    reactor.stop()

if len(sys.argv) == 2:
    url = sys.argv[1]
    client.getPage(url).addCallback(
        printPage).addErrback(
        printError)
    reactor.run()
else:
    print "Usage: webcat.py <URL>"
Give webcat.py a URL as its first argument, and it will fetch and print the contents of the page:
    $ python webcat.py http://www.oreilly.com/
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
    <head>
    <title>oreilly.com -- Welcome to O'Reilly Media, Inc. -- computer books, software
    conferences, online publishing</title>
    ...
The printPage and printError functions are simple event handlers that print the downloaded page contents or an error message, respectively. The most important line in Example 3-1 is the call to
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Accessing a Password-Protected Page
Web pages can require authentication. If you're developing an HTTP client application, it's a good idea to be prepared to handle this case, and give the user some way of entering his login name and password.
If an HTTP request fails with a 401 status code, authentication is required. Try the request again, this time passing a user-supplied login and password in an Authorization header, as shown in the script webcat3.py in Example 3-3.
Example 3-3. webcat3.py
from twisted.web import client, error as weberror
from twisted.internet import reactor
import sys, getpass, base64

def printPage(data):
    print data
    reactor.stop()

def checkHTTPError(failure, url):
    failure.trap(weberror.Error)
    if failure.value.status == '401':
        print >> sys.stderr, failure.getErrorMessage()
        # prompt for user name and password
        username = raw_input("User name: ")
        password = getpass.getpass("Password: ")
        basicAuth = base64.encodestring("%s:%s" % (username, password))
        authHeader = "Basic " + basicAuth.strip()
        # try to fetch the page again with authentication
        return client.getPage(
            url, headers={"Authorization": authHeader})
    else:
        return failure

def printError(failure):
    print >> sys.stderr, "Error:", failure.getErrorMessage()
    reactor.stop()

if len(sys.argv) == 2:
    url = sys.argv[1]
    client.getPage(url).addErrback(
        checkHTTPError, url).addCallback(
        printPage).addErrback(
        printError)
    reactor.run()
else:
    print "Usage: %s <URL>" % sys.argv[0]
Run webcat3.py with a URL as the first argument, and it will attempt to download and print the page. If it receives a 401 error, it will ask for a username and password and try the request again:
    $ python webcat3.py http://example.com/protected/page
               
    401 Authorization Required
    User name: 
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Uploading a File
From a user's perspective, there's no easier way to upload a file than going to a web page, using an HTML form to select the file, and pressing the submit button. Because of this, many web sites have adopted HTTP as a means of allowing file uploads. There are times, however, when you might need to perform a file upload without using a browser. Perhaps you want to develop an application that can upload image files to a photo-sharing service, or HTML documents to a web-based content management system. This lab shows how to use Twisted's HTTP client support to perform a file upload.
First, encode the field/value pairs and file data that you wish to upload as a multipart/form-data MIME document . Neither the Python standard library nor Twisted provides an easy way to do this, but you can do it yourself without too much effort. Then pass the encoded form data as the formdata keyword argument to client.getPage or client.downloadPage, along with POST as the HTTP method. You can then work with the results of getPage or downloadPage as you would any other HTTP response. Example 3-4 shows a script named validate.py that uploads a file to the W3C validation service, saves the response to a local file, and then displays it in the user's browser.
Example 3-4. validate.py
from twisted.web import client
import os, tempfile, webbrowser, random

def encodeForm(inputs):
    """
    Takes a dict of inputs and returns a multipart/form-data string
    containing the utf-8 encoded data. Keys must be strings, values
    can be either strings or file-like objects.
    """
    getRandomChar = lambda: chr(random.choice(range(97, 123)))
    randomChars = [getRandomChar() for x in range(20)]
    boundary = "---%s---" % ''.join(randomChars)
    lines = [boundary]
    for key, val in inputs.items():
        header = 'Content-Disposition: form-data; name="%s"' % key
        if hasattr(val, 'name'):
            header += '; filename="%s"' % os.path.split(val.name)[1]
        lines.append(header)
        if hasattr(val, 'read'):
            lines.append(val.read())
        else:
            lines.append(val.encode('utf-8'))
        lines.append('')
        lines.append(boundary)
    return "\r\n".join(lines)

def showPage(pageData):
    # write data to temp .html file, show file in browser
    tmpfd, tmp = tempfile.mkstemp('.html')
    os.close(tmpfd)
    file(tmp, 'w+b').write(pageData)
    webbrowser.open('file://' + tmp)
    reactor.stop()

def handleError(failure):
    print "Error:", failure.getErrorMessage()
    reactor.stop()

if __name__ == "_ _main_ _":
    import sys
    from twisted.internet import reactor

    filename = sys.argv[1]
    fileToCheck = file(filename)
    form = encodeForm({'uploaded_file': fileToCheck})
    postRequest = client.getPage(
        'http://validator.w3.org/check',
        method='POST',
        headers={'Content-Type': 'multipart/form-data; charset=utf-8',
                 'Content-Length': str(len(form))},
        postdata=form)
    postRequest.addCallback(showPage).addErrback(handleError)
    reactor.run()
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Checking Whether a Page Has Changed
One popular HTTP application is RSS (Really Simple Syndication) aggregators, which download news items or blog posts in RSS (or Atom) format. RSS aggregators download a new copy of each RSS feed at regular intervals, typically once an hour. This process can end up wasting a lot of bandwidth for the publisher of the RSS feed, though: the contents of the feed may change infrequently, which means that the client will be downloading the same data over and over again.
To prevent this waste of network resources, RSS aggregators (and other applications that request the same page multiple times) are encouraged to use a conditional HTTP GET request. By including conditional HTTP headers with a request, a client instructs the server to return the page data only if certain conditions are met. And, of course, one of those conditions might be whether the page has been modified since it was last checked.
Keep track of the headers returned the first time you download the page. Look for either an ETag header, which identifies the unique revision of the page, or the Last-Modified header, which gives the page's modification time. The next time you request the page, send the headers If-None-Match, with the ETag value, and If-Modified-Since, with the Last-Modified value. If the server supports conditional GET requests, it will return a 304 Unchanged response if the page has not been modified since the last request.
The getPage and downloadPage functions provided by twisted.web.client are handy, but they don't allow for the level of control necessary to use conditional requests. Therefore, you'll need to use the slightly lower-level HTTPClientFactory interface. Example 3-5 demonstrates using HTTPClientFactory to test whether a page has been updated.
Example 3-5. updatecheck.py
from twisted.web import client

class HTTPStatusChecker(client.HTTPClientFactory):

    def _ _init_ _(self, url, headers=None):
        client.HTTPClientFactory._ _init_ _(self, url, headers=headers)
        self.status = None
        self.deferred.addCallback(
            lambda data: (data, self.status, self.response_headers))

    def noPage(self, reason): # called for non-200 responses
        if self.status == '304': # Page hadn't changed
            client.HTTPClientFactory.page(self, '')
        else:
            client.HTTPClientFactory.noPage(self, reason)

def checkStatus(url, contextFactory=None, *args, **kwargs):
    scheme, host, port, path = client._parse(url)
    factory = HTTPStatusChecker(url, *args, **kwargs)
    if scheme == 'https':
        from twisted.internet import ssl
        if contextFactory is None:
            contextFactory = ssl.ClientContextFactory()
        reactor.connectSSL(host, port, factory, contextFactory)
    else:
        reactor.connectTCP(host, port, factory)
    return factory.deferred

def handleFirstResult(result, url):
    data, status, headers = result
    nextRequestHeaders = {}
    eTag = headers.get('etag')
    if eTag:
        nextRequestHeaders['If-None-Match'] = eTag[0]
    modified = headers.get('last-modified')
    if modified:
        nextRequestHeaders['If-Modified-Since'] = modified[0]
    return checkStatus(url, headers=nextRequestHeaders).addCallback(
        handleSecondResult)

def handleSecondResult(result):
    data, status, headers = result
    print 'Second request returned status %s:' % status,
    if status == '200':
        print 'Page changed (or server does not support conditional requests).'
    elif status == '304':
        print 'Page is unchanged.'
    else:
        print 'Unexpected Response.'
    reactor.stop()

def handleError(failure):
    print "Error", failure.getErrorMessage()
    reactor.stop()

if __name__ == "_ _main_ _":
    import sys
    from twisted.internet import reactor

    url = sys.argv[1]
    checkStatus(url).addCallback(
        handleFirstResult, url).addErrback(
        handleError)
    reactor.run()
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Monitoring Download Progress
One potential weakness in the examples presented so far in this chapter is that there hasn't been a way to monitor a download in progress. Sure, it's nice that a Deferred will pass you the results of a page once it's completely downloaded, but sometimes what you really need is to keep an eye on the download as it's happening.
Again, the utility functions provided by twisted.web.client don't give you quite enough control. Define a subclass of client.HTTPDownloader, the factory class used for downloading a web page to a file. By overriding a couple of methods, you can keep track of a download in progress. The webdownload.py script in Example 3-6 shows how.
Example 3-6. webdownload.py
from twisted.web import client

class HTTPProgressDownloader(client.HTTPDownloader):

    def gotHeaders(self, headers):
        if self.status == '200': # page data is on the way
            if headers.has_key('content-length'):
                self.totalLength = int(headers['content-length'][0])
            else:
                self.totalLength = 0
            self.currentLength = 0.0
            print ''
        return client.HTTPDownloader.gotHeaders(self, headers)

    def pagePart(self, data):
        if self.status == '200':
            self.currentLength += len(data)
            if self.totalLength:
                percent = "%i%%" % (
                    (self.currentLength/self.totalLength)*100)
            else:
                percent = '%dK' % (self.currentLength/1000)
                print "\033[1FProgress: " + percent
        return client.HTTPDownloader.pagePart(self, data)

def downloadWithProgress(url, file, contextFactory=None, *args, **kwargs):
    scheme, host, port, path = client._parse(url)
    factory = HTTPProgressDownloader(url, file, *args, **kwargs)
    if scheme == 'https':
        from twisted.internet import ssl
        if contextFactory is None:
            contextFactory = ssl.ClientContextFactory()
        reactor.connectSSL(host, port, factory, contextFactory)
    else:
        reactor.connectTCP(host, port, factory)
    return factory.deferred

if __name__ == "_ _main_ _":
    import sys
    from twisted.internet import reactor

    def downloadComplete(result):
        print "Download Complete."
        reactor.stop()

    def downloadError(failure):
        print "Error:", failure.getErrorMessage()
        reactor.stop()

    url, outputFile = sys.argv[1:]
    downloadWithProgress(url, outputFile).addCallback(
        downloadComplete).addErrback(
        downloadError)

    reactor.run()
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Web Servers
It's probably safe to say that these days, most new software is being developed in the form of web applications. People spend an increasingly large part of their day in their web browser, not just reading HTML pages but sending email, managing calendars, entering records into databases, updating Wiki pages, and writing weblog posts.
Even if you're not writing an application strictly for the Web, a web interface is often the easiest way to provide a cross-platform UI for things like administration and reporting. The ability to include a lightweight web server inside your app without introducing any additional dependencies is one of the great things about developing with Twisted. This chapter shows you how to run a web server using Twisted, and introduces you to some building blocks for creating web applications. It also offers an example of a custom HTTP proxy server.
This chapter provides some introductory information about the HTTP protocol used by web servers and web clients. There are many additional details of HTTP that you should know if you're serious about building web applications. In fact, there's enough information to write an entire book on the subject, such as HTTP: The Definitive Guide by David Gourley and Brian Totty (O'Reilly). There's also no substitute for reading the HTTP spec, RFC 2616 (http://www.faqs.org/rfcs/rfc2616.html).
HTTP is, on its surface, a simple protocol. A client sends a request, the server sends a response, the connection closes. You can experiment with HTTP by writing your own Protocol that accepts a connection, reads the request, and sends back an HTTP-formatted response.
Every HTTP request starts with a single line containing the HTTP method, a partial Uniform Resource Identifier (URI), and the HTTP version. Following this line are an arbitrary number of header lines. A blank line indicates the end of the headers. The header section is optionally followed by additional data called the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Responding to HTTP Requests
HTTP is, on its surface, a simple protocol. A client sends a request, the server sends a response, the connection closes. You can experiment with HTTP by writing your own Protocol that accepts a connection, reads the request, and sends back an HTTP-formatted response.
Every HTTP request starts with a single line containing the HTTP method, a partial Uniform Resource Identifier (URI), and the HTTP version. Following this line are an arbitrary number of header lines. A blank line indicates the end of the headers. The header section is optionally followed by additional data called the body of the request, such as data being posted from an HTML form.
Here's an example of a minimal HTTP request. This request asks the server to perform the method GET on the resource www.example.com/index.html, preferably using HTTP version 1.1:
    GET /index.html HTTP/1.1
    Host: www.example.com
The first line of the server's response tells the client the HTTP version being used for the response and the HTTP status code. Like the request, the response also contains header lines followed by a blank line and the message body. Here's a minimal HTTP response:
    HTTP/1.1 200 OK
    Content-Type: text/plain
    Content-Length: 17
    Connection: Close

    Hello HTTP world!
To set up a very basic HTTP server, write a Protocol that accepts input from the client. Look for the blank line that identifies the end of the headers. Then send an HTTP response. Example 4-1 shows a simple HTTP implementation that echoes each request back to the client.
Example 4-1. webecho.py
from twisted.protocols import basic
from twisted.internet import protocol, reactor

class HttpEchoProtocol(basic.LineReceiver):

    def _ _init_ _(self):
        self.lines = []
        self.gotRequest = False

    def lineReceived(self, line):
        self.lines.append(line)
        if not line and not self.gotRequest:
            self.sendResponse()
            self.gotRequest = True

    def sendResponse(self):
        responseBody = "You said:\r\n\r\n" + "\r\n".join(self.lines)
        self.sendLine("HTTP 
/1.0 200 OK")
        self.sendLine("Content-Type: text/plain")
        self.sendLine("Content-Length: %i" % len(responseBody))
        self.sendLine("")
        self.transport.write(responseBody)
        self.transport.loseConnection()

f = protocol.ServerFactory()
f.protocol = HttpEchoProtocol
reactor.listenTCP(8000, f)
reactor.run()
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parsing HTTP Requests
The HTTPEchoProtocol class in Example 4-1 provides an interesting glimpse into HTTP in action, but it's a long way from being ready for use in a real web server. It doesn't even parse the request to figure out what resource the client is trying to access, or what HTTP method she's using. Before you try to build a real web application, you need a better way to parse and respond to requests. This lab shows you how.
Write a subclass of twisted.web.http.Request with a process method that processes the current request. The Request object will already contain all the important information about an HTTP request when process is called, so all you have to do is decide how to respond. Example 4-2 demonstrates how to run an HTTP server based on a subclass of http.Request.
Example 4-2. requesthandler.py
from twisted.web import http

class MyRequestHandler(http.Request):
    pages = {
        '/': '<h1>Home</h1>Home page',
        '/test': '<h1>Test</h1>Test page',
        }

    def process(self):
        if self.pages.has_key(self.path):
            self.write(self.pages[self.path])
        else:
            self.setResponseCode(http.NOT_FOUND)
            self.write("<h1>Not Found</h1>Sorry, no such page.")
        self.finish()

class MyHttp(http.HTTPChannel):
    requestFactory = MyRequestHandler

class MyHttpFactory(http.HTTPFactory):
    protocol = MyHttp

if __name__ == "_ _main_ _":
    from twisted.internet import reactor
    reactor.listenTCP(8000, MyHttpFactory())
    reactor.run()
Run requesthandler.py and it will start up a web server on port 8000. You should be able to view both the home page (http://localhost:8000/) and the page /test (http://localhost:8000/test) in your browser. Figure 4-2 shows you how the page /test will look in your browser.
Figure 4-2:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Working with POST Data from HTML Forms
The previous lab showed how to take a request from a client and return a response containing static HTML. This lab shows how you could write code to control how each response is generated, and act on data submitted from an HTML form.
Write functions that take a Request object and work with it to generate a response. Set up a dictionary to map each available path in your web site to a function that will handle requests for that path. Use the Request.args dictionary to access data submitted from an HTML form. Example 4-3 shows a web server that generates one page containing an HTML form, and another page that processes the form and displays the results.
Example 4-3. formhandler.py
from twisted.web import http

def renderHomePage(request):
    colors = 'red', 'blue', 'green'
    flavors = 'vanilla', 'chocolate', 'strawberry', 'coffee'
    request.write("""
    <html>
    <head>
      <title>Form Test</html>
    </head>
    <body>
      <form action='posthandler' method='post'>
        Your name:
        <p>
          <input type='text' name='name'>
        </p>
        What's your favorite color?
        <p>
    """)
    for color in colors:
        request.write(
            "<input type='radio' name='color' value='%s'>%s<br />" % (
            color, color.capitalize()))
    request.write("""
        </p>
        What kinds of ice cream do you like?
        <p>
        """)
    for flavor in flavors:
        request.write(
            "<input type='checkbox' name='flavor' value='%s'>%s<br />" % (
            flavor, flavor.capitalize()))
    request.write("""
        </p>
        <input type='submit' />
      </form>
    </body>
    </html>
    """)
    request.finish()

def handlePost(request):
    request.write("""
    <html>
      <head>
        <title>Posted Form Datagg</title>
      </head>
      <body>
      <h1>Form Data</h1>
    """)

    for key, values in request.args.items():
        request.write("<h2>%s</h2>" % key)
        request.write("<ul>")
        for value in values:
            request.write("<li>%s</li>" % value)
        request.write("</ul>")

    request.write("""
       </body>
    </html>
    """)
    request.finish()

class FunctionHandledRequest(http.Request):
    pageHandlers = {
        '/': renderHomePage,
        '/posthandler': handlePost,
        }

    def process(self):
        self.setHeader('Content-Type', 'text/html')
        if self.pageHandlers.has_key(self.path):
            handler = self.pageHandlers[self.path]
            handler(self)
        else:
            self.setResponseCode(http.NOT_FOUND)
            self.write("<h1>Not Found</h1>Sorry, no such page.")
            self.finish()

class MyHttp(http.HTTPChannel):
    requestFactory = FunctionHandledRequest

class MyHttpFactory(http.HTTPFactory):
    protocol = MyHttp

if __name__ == "_ _main_ _":
    from twisted.internet import reactor
    reactor.listenTCP(8000, MyHttpFactory())
    reactor.run()
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Managing a Hierarchy of Resources
The paths in a web application usually imply a hierarchy of resources. For example, look at these URIs:
    http://example.com/people
    http://example.com/people/charles
    http://example.com/people/charles/contact
It's easy to see the hierarchy here. The page /people/charles is a child of /people, and the page /people/charles/contact is a child of /people/charles. Each page in the hierarchy is more specific: /people/charles is one specific person, and /people/charles/contact is one specific type of data (in this case, contact information) related to charles.
The default behavior for most web servers is to map request paths to a hierarchy of files and folders on disk. Each time a client requests the resource at a certain path, the web server tries to find a file at the corresponding path on disk, and responds with either the content of the file itself or (as in the case of a CGI script) the output created by executing the file. But in web applications, it can be artificially constraining to have to have a file on disk for every path that might be requested. For example, the data in your application might not be stored on disk, but in a relational database in another server. Or you might want to create resources on demand when they are requested. In cases like this, it's useful to be able to write your own logic for navigating a hierarchy of resources.
Writing your own logic for managing resources can also help you to manage security. Rather than opening up an entire directory to web access, you can selectively control which files are made available.
The twisted.web.resource, twisted.web.static, and twisted.web.server modules provide classes for working with requests at a higher level than twisted.web.http.Resource, which you can use to set up a web server that combines several different kinds of resources into a logical hierarchy. Example 4-4 uses these classes to build an application for testing hexadecimal color codes. Request the resource
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Storing Web Data in an SQL Database
Lots of web applications use an SQL backend for data storage. With a Twisted application, though, you wouldn't want to use a regular Python SQL library. Standard SQL libraries have blocking function calls: every time you run a query, the query function will pause your application until the server returns a result. This can take a long time, especially if the query requires a lot of processing, or if the network connection to the server is slow. To use an SQL database with Twisted, you need a way to run queries using Deferreds, allowing your app to continue doing other things while it's waiting for the results.
Twisted provides such an SQL library in the twisted.enterprise package. twisted.enterprise doesn't actually include SQL drivers; it would be far too much work to support every database you might potentially want to use. Instead, twisted.enterprise provides an asynchronous API on top of the standard DB-API interface used by many Python database modules. When necessary, it uses threads t