BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Programming Python
Programming Python, Second Edition Object-Oriented Scripting

By Mark Lutz

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introducing Python
This book is about using Python, a very high-level, object-oriented, open source programming language, designed to optimize development speed. Although it is completely general-purpose, Python is often called an object-oriented scripting language, partly because of its sheer ease of use, and partly because it is commonly used to orchestrate or "glue" other software components in an application.
If you are new to Python, chances are you've heard about the language somewhere, but are not quite sure what it is about. To help you get started, this chapter provides a nontechnical introduction to Python's features and roles. Most of it will make more sense once you have seen real Python programs, but let's first take a quick pass over the forest before wandering among the trees.
In the preface, I mentioned that Python emphasizes concepts such as quality, productivity, portability, and integration. Since these four terms summarize most of the reasons for using Python, I'd like to define them in a bit more detail:
Quality
Python makes it easy to write software that can be reused and maintained. It was deliberately designed to raise development quality expectations in the scripting world. Python's clear syntax and coherent design almost forces programmers to write readable code -- a critical feature for software that may be changed by others. The Python language really does look like it was designed, not accumulated. Python is also well tooled for modern software reuse methodologies. In fact, writing high-quality Python components that may be applied in multiple contexts is almost automatic.
Productivity
Python is optimized for speed of development. It's easy to write programs fast in Python, because the interpreter handles details you must code explicitly in lower-level languages. Things like type declarations, memory management, and build procedures are nowhere to be found in Python scripts. But fast initial development is only one component of productivity. In the real world, programmers must write code both for a computer to execute and for other programmers to read and maintain. Because Python's syntax resembles
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
"And Now for Something Completely Different"
This book is about using Python, a very high-level, object-oriented, open source programming language, designed to optimize development speed. Although it is completely general-purpose, Python is often called an object-oriented scripting language, partly because of its sheer ease of use, and partly because it is commonly used to orchestrate or "glue" other software components in an application.
If you are new to Python, chances are you've heard about the language somewhere, but are not quite sure what it is about. To help you get started, this chapter provides a nontechnical introduction to Python's features and roles. Most of it will make more sense once you have seen real Python programs, but let's first take a quick pass over the forest before wandering among the trees.
In the preface, I mentioned that Python emphasizes concepts such as quality, productivity, portability, and integration. Since these four terms summarize most of the reasons for using Python, I'd like to define them in a bit more detail:
Quality
Python makes it easy to write software that can be reused and maintained. It was deliberately designed to raise development quality expectations in the scripting world. Python's clear syntax and coherent design almost forces programmers to write readable code -- a critical feature for software that may be changed by others. The Python language really does look like it was designed, not accumulated. Python is also well tooled for modern software reuse methodologies. In fact, writing high-quality Python components that may be applied in multiple contexts is almost automatic.
Productivity
Python is optimized for speed of development. It's easy to write programs fast in Python, because the interpreter handles details you must code explicitly in lower-level languages. Things like type declarations, memory management, and build procedures are nowhere to be found in Python scripts. But fast initial development is only one component of productivity. In the real world, programmers must write code both for a computer to execute and for other programmers to read and maintain. Because Python's syntax resembles
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Life of Python
Python was invented around 1990 by Guido van Rossum, when he was at CWI in Amsterdam. Despite the reptiles, it is named after the BBC comedy series Monty Python's Flying Circus, of which Guido is a fan (see the following silly sidebar). Guido was also involved with the Amoeba distributed operating system and the ABC language. In fact, the original motivation for Python was to create an advanced scripting language for the Amoeba system.
But Python's design turned out to be general enough to address a wide variety of domains. It's now used by hundreds of thousands of engineers around the world, in increasingly diverse roles. Companies use Python today in commercial products, for tasks such as testing chips and boards, developing GUIs, searching the Web, animating movies, scripting games, serving up maps and email on the Internet, customizing C++ class libraries, and much more. In fact, because Python is a completely general-purpose language, its target domains are only limited by the scope of computers in general.
Since it first appeared on the public domain scene in 1991, Python has continued to attract a loyal following, and spawned a dedicated Internet newsgroup, comp.lang.python, in 1994. And as the first edition of this book was being written in 1995, Python's home page debuted on the WWW at http://www.python.org -- still the official place to find all things Python.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Compulsory Features List
One way to describe a language is by listing its features. Of course, this will be more meaningful after you've seen Python in action; the best I can do now is speak in the abstract. And it's really how Python's features work together, that make it what it is. But looking at some of Python's attributes may help define it; Table 1-1 lists some of the common reasons cited for Python's appeal.
Table 1-1: Python Language Features
Features
Benefits
No compile or link steps
Rapid development cycle turnaround
No type declarations
Simpler, shorter, and more flexible programs
Automatic memory management
Garbage collection avoids bookkeeping code
High-level datatypes and operations
Fast development using built-in object types
Object-oriented programming
Code reuse, C++, Java, and COM integration
Embedding and extending in C
Optimization, customization, system "glue"
Classes, modules, exceptions
Modular "programming-in-the-large" support
A simple, clear syntax and design
Readability, maintainability, ease of learning
Dynamic loading of C modules
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's Python Good For?
Because Python is used in a wide variety of ways, it's almost impossible to give an authoritative answer to this question. In general, any application that can benefit from the inclusion of a language optimized for speed of development is a good target Python application domain. Given the ever-shrinking schedules in software development, this a very broad category.
A more specific answer is less easy to formulate. For instance, some use Python as an embedded extension language, while others use it exclusively as a standalone programming tool. And to some extent, this entire book will answer this very question -- it explores some of Python's most common roles. For now, here's a summary of some of the more common ways Python is being applied today:
System utilities
Portable command-line tools, testing systems
Internet scripting
CGI web sites, Java applets, XML, ASP, email tools
Graphical user interfaces
With APIs such as Tk, MFC, Gnome, KDE
Component integration
C/C++ library front-ends, product customization
Database access
Persistent object stores, SQL database system interfaces
Distributed programming
With client/server APIs like CORBA, COM
Rapid-prototyping /development
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's Python Not Good For?
To be fair again, some tasks are outside of Python's scope. Like all dynamic languages, Python (as currently implemented) isn't as fast or efficient as static, compiled languages like C. In many domains, the difference doesn't matter; for programs that spend most of their time interacting with users or transferring data over networks, Python is usually more than adequate to meet the performance needs of the entire application. But efficiency is still a priority in some domains.
Because it is interpreted today, Python alone usually isn't the best tool for delivery of performance-critical components. Instead, computationally intensive operations can be implemented as compiled extensions to Python, and coded in a low-level language like C. Python can't be used as the sole implementation language for such components, but it works well as a frontend scripting interface to them.
For example, numerical programming and image processing support has been added to Python by combining optimized extensions with a Python language interface. In such a system, once the optimized extensions have been developed, most of the programming occurs at the higher-level Python scripting level. The net result is a numerical programming tool that's both efficient and easy to use.
Moreover, Python can still serve as a prototyping tool in such domains. Systems may be implemented in Python first, and later moved in whole or piecemeal to a language like C for delivery. C and Python have distinct strengths and roles; a hybrid approach, using C for compute-intensive modules, and Python for prototyping and frontend interfaces, can leverage the benefits of both.
In some sense, Python solves the efficiency/flexibility tradeoff by not solving it at all. It provides a language optimized for ease of use, along with tools needed to integrate with other languages. By combining components written in Python and compiled languages like C and C++, developers may select an appropriate mix of usability and performance for each particular application. While it's unlikely that it will ever be as fast as C, Python's speed of development is at least as important as C's speed of execution in most modern software projects.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: System Tools
This chapter begins our look at ways to apply Python to real programming tasks. In this and the following chapters, we'll see how to use Python to write system tools, graphical user interfaces, database applications, Internet scripts, web sites, and more. Along the way we'll also study larger Python programming concepts in action: code reuse, maintainability, object-oriented programming, and so on.
In this first part of the book, we begin our Python programming tour by exploring the systems application domain -- scripts that deal with files, programs, and the environment surrounding a program in general. Although the examples in this domain focus on particular kinds of tasks, the techniques they employ will prove to be useful in later parts of the book as well. In other words, you should begin your journey here, unless you already are a Python systems programming wizard.
Python's system interfaces span application domains, but for the next four chapters, most of our examples fall into the category of system tools -- programs sometimes called command-line utilities, shell scripts, or some permutation of such words. Regardless of their title, you are probably familiar with this sort of script already; they accomplish tasks like processing files in a directory, launching test scripts, and so on. Such programs historically have been written in nonportable and syntactically obscure shell languages such as DOS batch files, csh, and awk.
Even in this relatively simple domain, though, some of Python's better attributes shine brightly. For instance, Python's ease of use and extensive built-in library make it simple (and even fun) to use advanced system tools such as threads, signals, forks, sockets, and their kin; such tools are much less accessible under the obscure syntax of shell languages and the slow development cycles of compiled languages. Python's support for concepts like code clarity and object-oriented programming also help us write shell tools that can be read, maintained, and reused. When using Python, there is no need to start every new script from scratch.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
"The os.path to Knowledge"
This chapter begins our look at ways to apply Python to real programming tasks. In this and the following chapters, we'll see how to use Python to write system tools, graphical user interfaces, database applications, Internet scripts, web sites, and more. Along the way we'll also study larger Python programming concepts in action: code reuse, maintainability, object-oriented programming, and so on.
In this first part of the book, we begin our Python programming tour by exploring the systems application domain -- scripts that deal with files, programs, and the environment surrounding a program in general. Although the examples in this domain focus on particular kinds of tasks, the techniques they employ will prove to be useful in later parts of the book as well. In other words, you should begin your journey here, unless you already are a Python systems programming wizard.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Why Python Here?
Python's system interfaces span application domains, but for the next four chapters, most of our examples fall into the category of system tools -- programs sometimes called command-line utilities, shell scripts, or some permutation of such words. Regardless of their title, you are probably familiar with this sort of script already; they accomplish tasks like processing files in a directory, launching test scripts, and so on. Such programs historically have been written in nonportable and syntactically obscure shell languages such as DOS batch files, csh, and awk.
Even in this relatively simple domain, though, some of Python's better attributes shine brightly. For instance, Python's ease of use and extensive built-in library make it simple (and even fun) to use advanced system tools such as threads, signals, forks, sockets, and their kin; such tools are much less accessible under the obscure syntax of shell languages and the slow development cycles of compiled languages. Python's support for concepts like code clarity and object-oriented programming also help us write shell tools that can be read, maintained, and reused. When using Python, there is no need to start every new script from scratch.
Moreover, we'll find that Python not only includes all the interfaces we need to write system tools, it also fosters script portability. By employing Python's standard library, most system scripts written in Python are automatically portable to all major platforms. A Python directory-processing script written in Windows, for instance, can usually also be run in Linux without changing its source code at all -- simply copy over the source code. If used well, Python is the only system scripting tool you need to know.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
System Scripting Overview
The next two sections will take a quick tour through sys and os, before this chapter moves on to larger system programming concepts. As I'm not going to demonstrate every item in every built-in module, the first thing I want to do is show you how to get more details on your own. Officially, this task also serves as an excuse for introducing a few core system scripting concepts -- along the way, we'll code a first script to format documentation.
Most system-level interfaces in Python are shipped in just two modules: sys and os. That's somewhat oversimplified; other standard modules belong to this domain too (e.g., glob, socket, thread, time, fcntl), and some built-in functions are really system interfaces as well (e.g., open). But sys and os together form the core of Python's system tools arsenal.
In principle at least, sys exports components related to the Python interpreter itself (e.g., the module search path), and os contains variables and functions that map to the operating system on which Python is run. In practice, this distinction may not always seem clear-cut (e.g., the standard input and output streams show up in sys, but they are at least arguably tied to operating system paradigms). The good news is that you'll soon use the tools in these modules so often that their locations will be permanently stamped on your memory.
The os module also attempts to provide a portable programming interface to the underlying operating system -- its functions may be implemented differently on different platforms, but they look the same everywhere to Python scripts. In addition, the os module exports a nested submodule, os.path, that provides a portable interface to file and directory processing tools.
As you can probably deduce from the preceding paragraphs, learning to write system scripts in Python is mostly a matter of learning about Python's system modules. Luckily, there are a variety of information sources to make this task easier -- from module attributes to published references and books.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The sys Module
On to module details. As mentioned earlier, the sys and os modules form the core of much of Python's system-related toolset. Let's now take a quick, interactive tour through some of the tools in these two modules, before applying them in bigger examples.
Like most modules, sys includes both informational names and functions that take action. For instance, its attributes give us the name of the underlying operating system the platform code is running on, the largest possible integer on this machine, and the version number of the Python interpreter running our code:
C:\...\PP2E\System>python
>>> import sys
>>> sys.platform, sys.maxint, sys.version
('win32', 2147483647, '1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)]')
>>>
>>> if sys.platform[:3] == 'win': print 'hello windows'
...
hello windows
If you have code that must act differently on different machines, simply test the sys.platform string as done here; although most of Python is cross-platform, nonportable tools are usually wrapped in if tests like the one here. For instance, we'll see later that program launch and low-level console interaction tools vary per platform today -- simply test sys.platform to pick the right tool for the machine your script is running on.
The sys module also lets us inspect the module search path both interactively and within a Python program. sys.path is a list of strings representing the true search path in a running Python interpreter. When a module is imported, Python scans this list from left to right, searching for the module's file on each directory named in the list. Because of that, this is the place to look to verify that your search path is really set as intended.
The sys.path list is simply initialized from your PYTHONPATH setting plus system defaults, when the interpreter is first started up. In fact, you'll notice quite a few directories that are not on your PYTHONPATH if you inspect
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The os Module
As mentioned, os contains all the usual operating-system calls you may have used in your C programs and shell scripts. Its calls deal with directories, processes, shell variables, and the like. Technically, this module provides POSIX tools -- a portable standard for operating-system calls -- along with platform-independent directory processing tools as nested module os.path. Operationally, os serves as a largely portable interface to your computer's system calls: scripts written with os and os.path can usually be run on any platform unchanged.
In fact, if you read the os module's source code, you'll notice that it really just imports whatever platform-specific system module you have on your computer (e.g., nt, mac, posix). See the file os.py in the Python source library directory -- it simply runs a from* statement to copy all names out of a platform-specific module. By always importing os instead of platform-specific modules, though, your scripts are mostly immune to platform implementation differences.
Let's take a quick look at the basic interfaces in os. If you inspect this module's attributes interactively, you get a huge list of names that will vary per Python release, will likely vary per platform, and isn't incredibly useful until you've learned what each name means:
>>> import os
>>> dir(os)
['F_OK', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_RDONLY', 'O_RDWR', 
'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO',
'P_OVERLAY', 'P_WAIT', 'R_OK', 'UserDict', 'W_OK', 'X_OK', '_Environ',
'__builtins__', '__doc__', '__file__', '__name__', '_execvpe', '_exit',
'_notfound', 'access', 'altsep', 'chdir', 'chmod', 'close', 'curdir', 
'defpath', 'dup', 'dup2', 'environ', 'error', 'execl', 'execle', 'execlp',
'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'fdopen', 'fstat', 'getcwd',
'getpid', 'i', 'linesep', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir',
'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read',
'remove', 'removedirs', 'rename', 'renames', 'rmdir', 'sep', 'spawnv', 
'spawnve', 'stat', 'strerror', 'string', 'sys', 'system', 'times', 'umask', 
'unlink', 'utime', 'write']
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Script Execution Context
Python scripts don't run in a vacuum. Depending on platforms and startup procedures, Python programs may have all sorts of enclosing context -- information automatically passed-in to the program by the operating system when the program starts up. For instance, scripts have access to the following sorts of system-level inputs and interfaces:
Current working directory
os.getcwd gives access to the directory from which a script is started, and many file tools use its value implicitly.
Command-line arguments
sys.argv gives access to words typed on the command line used to start the program that serve as script inputs.
Shell variables
os.environ provides an interface to names assigned in the enclosing shell (or a parent program) and passed in to the script.
Standard streams
sys.stdin, stdout, and stderr export the three input/output streams that are at the heart of command-line shell tools.
Such tools can serve as inputs to scripts, configuration parameters, and so on. In the next few sections, we will explore these context tools -- both their Python interfaces and their typical roles.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Current Working Directory
The notion of the current working directory (CWD) turns out to be a key concept in some scripts' execution: it's always the implicit place where files processed by the script are assumed to reside, unless their names have absolute directory paths. As we saw earlier, os.getcwd lets a script fetch the CWD name explicitly, and os.chdir allows a script to move to a new CWD.
Keep in mind, though, that filenames without full pathnames map to the CWD, and have nothing to do with your PYTHONPATH setting. Technically, the CWD is always where a script is launched from, not the directory containing the script file. Conversely, imports always first search the directory containing the script, not the CWD (unless the script happens to also be located in the CWD). Since this distinction is subtle and tends to trip up beginners, let's explore it in more detail.
When you run a Python script by typing a shell command line like python dir1\dir2\file.py, the CWD is the directory you were in when you typed this command, not dir1\dir2. On the other hand, Python automatically adds the identity of the script's home directory to the front of the module search path, such that file.py can always import other files in dir1\dir2, no matter where it is run from. To illustrate, let's write a simple script to echo both its CWD and module search path:
C:\PP2ndEd\examples\PP2E\System>type whereami.py
import os, sys
print 'my os.getcwd =>', os.getcwd(  )            # show my cwd execution dir
print 'my sys.path  =>', sys.path[:6]           # show first 6 import paths
raw_input(  )                                     # wait for keypress if clicked
Now, running this script in the directory in which it resides sets the CWD as expected, and adds an empty string ('') to the front of the module search path, to designate the CWD (we met the sys.path module search path earlier):
C:\PP2ndEd\examples\PP2E\System>set PYTHONPATH=C:\PP2ndEd\examples
C:\PP2ndEd\examples\PP2E\System>
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Command-Line Arguments
The sys module is also where Python makes available the words typed on the command used to start a Python script. These words are usually referred to as command-line arguments, and show up in sys.argv, a built-in list of strings. C programmers may notice its similarity to the C "argv" array (an array of C strings). It's not much to look at interactively, because no command-line arguments are passed to start up Python in this mode:
>>> sys.argv
['']
To really see what arguments are about, we need to run a script from the shell command line. Example 2-2 shows an unreasonably simple one that just prints the argv list for inspection.
Example 2-2. PP2E\System\testargv.py
import sys
print sys.argv
Running this script prints the command-line arguments list; note that the first item is always the name of the executed Python script file itself, no matter how the script was started (see Executable Scripts on Unix later in this chapter):
C:\...\PP2E\System>python testargv.py
['testargv.py']

C:\...\PP2E\System>python testargv.py spam eggs cheese
['testargv.py', 'spam', 'eggs', 'cheese']

C:\...\PP2E\System>python testargv.py -i data.txt -o results.txt
['testargv.py', '-i', 'data.txt', '-o', 'results.txt']
The last command here illustrates a common convention. Much like function arguments, command-line options are sometimes passed by position, and sometimes by name using a "-name value" word pair. For instance, the pair -i data.txt means the -i option's value is data.txt (e.g., an input filename). Any words can be listed, but programs usually impose some sort of structure on them.
Command-line arguments play the same role in programs that function arguments do in functions: they are simply a way to pass information to a program that can vary per program run. Because they don't have to be hardcoded, they allow scripts to be more generally useful. For example, a file-processing script can use a command-line argument as the name of the file it should process; see the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Shell Environment Variables
Shell variables, sometimes known as environment variables, are made available to Python scripts as os.environ, a Python dictionary-like object with one entry per variable setting in the shell. Shell variables live outside the Python system; they are often set at your system prompt or within startup files, and typically serve as systemwide configuration inputs to programs.
In fact, by now you should be familiar with a prime example: the PYTHONPATH module search path setting is a shell variable used by Python to import modules. By setting it once in your system startup files, its value is available every time a Python program is run. Shell variables can also be set by programs to serve as inputs to other programs in an application; because their values are normally inherited by spawned programs, they can be used as a simple form of interprocess communication.
In Python, the surrounding shell environment becomes a simple preset object, not special syntax. Indexing os.environ by the desired shell variable's name string (e.g., os.environ['USER']) is the moral equivalent of adding a dollar sign before a variable name in most Unix shells (e.g., $USER), using surrounding percent signs on DOS (%USER%), and calling getenv("USER") in a C program. Let's start up an interactive session to experiment:
>>> import os
>>> os.environ.keys(  )
['WINBOOTDIR', 'PATH', 'USER', 'PP2HOME', 'CMDLINE', 'PYTHONPATH', 'BLASTER', 
'X', 'TEMP', 'COMSPEC', 'PROMPT', 'WINDIR', 'TMP']
>>> os.environ['TEMP']
'C:\\windows\\TEMP'
Here, the keys method returns a list of variables set, and indexing fetches the value of shell variable TEMP on Windows. This works the same on Linux, but other variables are generally preset when Python starts up. Since we know about PYTHONPATH, let's peek at its setting within Python to verify its content:
>>> os.environ['PYTHONPATH']
'C:\\PP2ndEd\\examples\\Part3;C:\\PP2ndEd\\examples\\Part2;C:\\PP2ndEd\\
examples\\Part2\\Gui;C:\\PP2ndEd\\examples'
>>>
>>> 
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Standard Streams
Module sys is also the place where the standard input, output, and error streams of your Python programs live:
>>> for f in (sys.stdin, sys.stdout, sys.stderr): print f
...
<open file '<stdin>', mode 'r' at 762210>
<open file '<stdout>', mode 'w' at 762270>
<open file '<stderr>', mode 'w' at 7622d0>
The standard streams are simply pre-opened Python file objects that are automatically connected to your program's standard streams when Python starts up. By default, they are all tied to the console window where Python (or a Python program) was started. Because the print statement and raw_input functions are really nothing more than user-friendly interfaces to the standard output and input streams, they are similar to using stdout and stdin in sys directly:
>>> print 'hello stdout world'
hello stdout world

>>> sys.stdout.write('hello stdout world' + '\n')
hello stdout world

>>> raw_input('hello stdin world>')
hello stdin world>spam
'spam'

>>> print 'hello stdin world>',; sys.stdin.readline(  )[:-1]
hello stdin world>eggs

'eggs'
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
File Tools
External files are at the heart of much of what we do with shell utilities. For instance, a testing system may read its inputs from one file, store program results in another file, and check expected results by loading yet another file. Even user interface and Internet-oriented programs may load binary images and audio clips from files on the underlying computer. It's a core programming concept.
In Python, the built-in open function is the primary tool scripts use to access the files on the underlying computer system. Since this function is an inherent part of the Python language, you may already be familiar with its basic workings. Technically, open gives direct access to the stdio filesystem calls in the system's C library -- it returns a new file object that is connected to the external file, and has methods that map more or less directly to file calls on your machine. The open function also provides a portable interface to the underlying filesystem -- it works the same on every platform Python runs on.
Other file-related interfaces in Python allow us to do things such as manipulate lower-level descriptor-based files (module os), store objects away in files by key (modules anydbm and shelve), and access SQL databases. Most of these are larger topics addressed in Chapter 16. In this section, we take a brief tutorial look at the built-in file object, and explore a handful of more advanced file-related topics. As usual, you should consult the library manual's file object entry for further details and methods we don't have space to cover here.
For most purposes, the open function is all you need to remember to process files in your scripts. The file object returned by open has methods for reading data (read, readline, readlines), writing data (write, writelines), freeing system resources (close), moving about in the file (seek), forcing data to be transferred out of buffers (flush), fetching the underlying file handle (fileno), and more. Since the built-in file object is so easy to use, though, let's jump right in to a few interactive examples.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Directory Tools
One of the more common tasks in the shell utilities domain is applying an operation to a set of files in a directory -- a "folder" in Windows-speak. By running a script on a batch of files, we can automate (that is, script) tasks we might have to otherwise run repeatedly by hand.
For instance, suppose you need to search all of your Python files in a development directory for a global variable name (perhaps you've forgotten where it is used). There are many platform-specific ways to do this (e.g., the grep command in Unix), but Python scripts that accomplish such tasks will work on every platform where Python works -- Windows, Unix, Linux, Macintosh, and just about any other in common use today. Simply copy your script to any machine you wish to use it on, and it will work, regardless of which other tools are available there.
The most common way to go about writing such tools is to first grab hold of a list of the names of the files you wish to process, and then step through that list with a Python for loop, processing each file in turn. The trick we need to learn here, then, is how to get such a directory list within our scripts. There are at least three options: running shell listing commands with os.popen, matching filename patterns with glob.glob, and getting directory listings with os.listdir. They vary in interface, result format, and portability.

Section 2.12.1.1: Running shell listing commands with os.popen

Quick: How did you go about getting directory file listings before you heard of Python? If you're new to shell tools programming, the answer may be: "Well, I started a Windows file explorer and clicked on stuff," but I'm thinking in terms of less GUI-oriented command-line mechanisms here (and answers submitted in Perl and Tcl only get partial credit).
On Unix, directory listings are usually obtained by typing ls in a shell; on Windows, they can be generated with a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Parallel System Tools
Most computers spend a lot of time doing nothing. If you start a system monitor tool and watch the CPU utilization, you'll see what I mean -- it's rare to see one hit 100%, even when you are running multiple programs. There are just too many delays built in to software: disk accesses, network traffic, database queries, waiting for users to click a button, and so on. In fact, the majority of a modern CPU's capacity is often spent in an idle state; faster chips help speed up performance demand peaks, but much of their power can go largely unused.
Early on in computing, programmers realized that they could tap into such unused processing power, by running more than one program at the same time. By dividing up the CPU's attention among a set of tasks, its capacity need not go to waste while any given task is waiting for an external event to occur. The technique is usually called parallel processing, because tasks seem to be performed at once, overlapping and parallel in time. It's at the heart of modern operating systems, and gave rise to the notion of multiple active-window computer interfaces we've all grown to take for granted. Even within a single program, dividing processing up into tasks that run in parallel can make the overall system faster, at least as measured by the clock on your wall.
Just as importantly, modern software systems are expected to be responsive to users, regardless of the amount of work they must perform behind the scenes. It's usually unacceptable for a program to stall while busy carrying out a request. Consider an email-browser user interface, for example; when asked to fetch email from a server, the program must download text from a server over a network. If you have enough email and a slow enough Internet link, that step alone can take minutes to finish. But while the download task proceeds, the program as a whole shouldn't stall -- it still must respond to screen redraws, mouse clicks, etc.
Parallel processing comes to the rescue here too. By performing such long-running tasks in parallel with the rest of the program, the system at large can remain responsive no matter how busy some of its parts may be.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
"Telling the Monkeys What to Do"
Most computers spend a lot of time doing nothing. If you start a system monitor tool and watch the CPU utilization, you'll see what I mean -- it's rare to see one hit 100%, even when you are running multiple programs. There are just too many delays built in to software: disk accesses, network traffic, database queries, waiting for users to click a button, and so on. In fact, the majority of a modern CPU's capacity is often spent in an idle state; faster chips help speed up performance demand peaks, but much of their power can go largely unused.
Early on in computing, programmers realized that they could tap into such unused processing power, by running more than one program at the same time. By dividing up the CPU's attention among a set of tasks, its capacity need not go to waste while any given task is waiting for an external event to occur. The technique is usually called parallel processing, because tasks seem to be performed at once, overlapping and parallel in time. It's at the heart of modern operating systems, and gave rise to the notion of multiple active-window computer interfaces we've all grown to take for granted. Even within a single program, dividing processing up into tasks that run in parallel can make the overall system faster, at least as measured by the clock on your wall.
Just as importantly, modern software systems are expected to be responsive to users, regardless of the amount of work they must perform behind the scenes. It's usually unacceptable for a program to stall while busy carrying out a request. Consider an email-browser user interface, for example; when asked to fetch email from a server, the program must download text from a server over a network. If you have enough email and a slow enough Internet link, that step alone can take minutes to finish. But while the download task proceeds, the program as a whole shouldn't stall -- it still must respond to screen redraws, mouse clicks, etc.
Parallel processing comes to the rescue here too. By performing such long-running tasks in parallel with the rest of the program, the system at large can remain responsive no matter how busy some of its parts may be.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Forking Processes
Forked processes are the traditional way to structure parallel tasks, and are a fundamental part of the Unix tool set. Forking is based on the notion of copying programs: when a program calls the fork routine, the operating system makes a new copy of that program in memory, and starts running that copy in parallel with the original. Some systems don't really copy the original program (it's an expensive operation), but the new copy works as if it was a literal copy.
After a fork operation, the original copy of the program is called the parent process, and the copy created by os.fork is called the child process. In general, parents can make any number of children, and children can create child processes of their own -- all forked processes run independently and in parallel under the operating system's control. It is probably simpler in practice than theory, though; the Python script in Example 3-1 forks new child processes until you type a "q" at the console.
Example 3-1. PP2E\System\Processes\fork1.py
# forks child processes until you type 'q'

import os

def child(  ):
    print 'Hello from child',  os.getpid(  )
    os._exit(0)  # else goes back to parent loop

def parent(  ):
    while 1:
        newpid = os.fork(  )
        if newpid == 0:
            child(  )
        else:
            print 'Hello from parent', os.getpid(  ), newpid
        if raw_input(  ) == 'q': break

parent(  )
Python's process forking tools, available in the os module, are simply thin wrappers over standard forking calls in the C library. To start a new, parallel process, call the os.fork built-in function. Because this function generates a copy of the calling program, it returns a different value in each copy: zero in the child process, and the process ID of the new child in the parent. Programs generally test this result to begin different processing in the child only; this script, for instance, runs the child function in child processes only.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Threads
Threads are another way to start activities running at the same time. They sometimes are called "lightweight processes," and they are run in parallel like forked processes, but all run within the same single process. For applications that can benefit from parallel processing, threads offer big advantages for programmers:
Performance
Because all threads run within the same process, they don't generally incur a big startup cost to copy the process itself. The costs of both copying forked processes and running threads can vary per platform, but threads are usually considered less expensive in terms of performance overhead.
Simplicity
Threads can be noticeably simpler to program too, especially when some of the more complex aspects of processes enter the picture (e.g., process exits, communication schemes, and "zombie" processes covered in Chapter 10).
Shared global memory
Also because threads run in a single process, every thread shares the same global memory space of the process. This provides a natural and easy way for threads to communicate -- by fetching and setting data in global memory. To the Python programmer, this means that global (module-level) variables and interpreter components are shared among all threads in a program: if one thread assigns a global variable, its new value will be seen by other threads. Some care must be taken to control access to shared global objects, but they are still generally simpler to use than the sorts of process communication tools necessary for forked processes we'll meet later in this chapter (e.g., pipes, streams, signals, etc.).
Portability
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Program Exits
As we've seen, unlike C there is no "main" function in Python -- when we run a program, we simply execute all the code in the top-level file, from top to bottom (i.e., in the filename we listed in the command line, clicked in a file explorer, and so on). Scripts normally exit when Python falls off the end of the file, but we may also call for program exit explicitly with the built-in sys.exit function:
>>> sys.exit(  )           # else exits on end of script
Interestingly, this call really just raises the built-in SystemExit exception. Because of this, we can catch it as usual to intercept early exits and perform cleanup activities; if uncaught, the interpreter exits as usual. For instance:
C:\...\PP2E\System>python
>>> import sys
>>> try:
...     sys.exit(  )              # see also: os._exit, Tk(  ).quit(  )
... except SystemExit:
...     print 'ignoring exit'
...
ignoring exit
>>> 
In fact, explicitly raising the built-in SystemExit exception with a Python raise statement is equivalent to calling sys.exit. More realistically, a try block would catch the exit exception raised elsewhere in a program; the script in Example 3-11 exits from within a processing function.
Example 3-11. PP2E\System\Exits\testexit_sys.py
def later(  ):
    import sys
    print 'Bye sys world'
    sys.exit(42)
    print 'Never reached'

if __name__ == '__main__': later(  )
Running this program as a script causes it to exit before the interpreter falls off the end of the file. But because sys.exit raises a Python exception, importers of its function can trap and override its exit exception, or specify a finally cleanup block to be run during program exit processing:
C:\...\PP2E\System\Exits>python testexit_sys.py
Bye sys world

C:\...\PP2E\System\Exits>python
>>> from testexit_sys import later
>>> try:                
...     later(  )
... except SystemExit:
...     print 'Ignored...'
...
Bye sys world
Ignored...
>>> 
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Interprocess Communication
As we saw earlier, when scripts spawn threads -- tasks that run in parallel within the program -- they can naturally communicate by changing and inspecting shared global memory. As we also saw, some care must be taken to use locks to synchronize access to shared objects that can't be updated concurrently, but it's a fairly straightforward communication model.
Things aren't quite as simple when scripts start processes and programs. If we limit the kinds of communications that can happen between programs, there are many options available, most of which we've already seen in this and the prior chapters. For example, the following can all be interpreted as cross-program communication devices:
  • Command-line arguments
  • Standard stream redirections
  • Pipes generated by os.popen calls
  • Program exit status codes
  • Shell environment variables
  • Even simple files
For instance, sending command-line options and writing to input streams lets us pass in program execution parameters; reading program output streams and exit codes gives us a way to grab a result. Because shell variable settings are inherited by spawned programs, they provide another way to pass context in. Pipes made by os.popen and simple files allow even more dynamic communication -- data can be sent between programs at arbitrary times, not only at program start and exit.
Beyond this set, there are other tools in the Python library for doing IPC -- Inter-Process Communication. Some vary in portability, and all vary in complexity. For instance, in Chapter 10 of this text we will meet the Python socket module, which lets us transfer data between programs running on the same computer, as well as programs located on remote networked machines.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Pipes
Pipes, another cross-program communication device, are made available in Python with the built-in os.pipe call. Pipes are unidirectional channels that work something like a shared memory buffer, but with an interface resembling a simple file on each of two ends. In typical use, one program writes data on one end of the pipe, and another reads that data on the other end. Each program only sees its end of the pipes, and processes it using normal Python file calls.
Pipes are much more within the operating system, though. For instance, calls to read a pipe will normally block the caller until data becomes available (i.e., is sent by the program on the other end), rather than returning an end-of-file indicator. Because of such properties, pipes are also a way to synchronize the execution of independent programs.
Pipes come in two flavors -- anonymous and named. Named pipes (sometimes called "fifos") are represented by a file on your computer. Anonymous pipes only exist within processes, though, and are typically used in conjunction with process forks as a way to link parent and spawned child processes within an application -- parent and child converse over shared pipe file descriptors. Because named pipes are really external files, the communicating processes need not be related at all (in fact, they can be independently started programs).
Since they are more traditional, let's start with a look at anonymous pipes. To illustrate, the script in Example 3-15 uses the os.fork call to make a copy of the calling process as usual (we met forks earlier in this chapter). After forking, the original parent process and its child copy speak through the two ends of a pipe created with os.pipe prior to the fork. The os.pipe call returns a tuple of two file descriptors -- the low-level file identifiers we met earlier -- representing the input and output sides of the pipe. Because forked child processes get copies of their parents' file descriptors, writing to the pipe's output descriptor in the child sends data back to the parent on the pipe created before the child was spawned.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Signals
For lack of a better analogy, signals are a way to poke a stick at a process. Programs generate signals to trigger a handler for that signal in another process. The operating system pokes too -- some signals are generated on unusual system events and may kill the program if not handled. If this sounds a little like raising exceptions in Python it should; signals are software-generated events, and the cross-process analog of exceptions. Unlike exceptions, though, signals are identified by number, are not stacked, and are really an asynchronous event mechanism controlled by the operating system, outside the scope of the Python interpreter.
In order to make signals available to scripts, Python provides a signal module that allows Python programs to register Python functions as handlers for signal events. This module is available on both Unix-like platforms and Windows (though the Windows version defines fewer kinds of signals to be caught). To illustrate the basic signal interface, the script in Example 3-20 installs a Python handler function for the signal number passed in as a command-line argument.
Example 3-20. PP2E\System\Processes\signal1.py
##########################################################
# catch signals in Python;  pass signal number N as a
# command-line arg, use a "kill -N pid" shell command
# to send this process a signal;  most signal handlers 
# restored by Python after caught (see network scripting
# chapter for SIGCHLD details); signal module avaiable
# on Windows, but