Testing, Debugging, and Optimizing: Chapter 18 - Python in a Nutshell
by Alex MartelliThis excerpt is from Python in a Nutshell. Python in a Nutshell provides a solid, no-nonsense quick reference to information that programmers rely on the most. This book will immediately earn its place in any Python programmer's library.
You're not finished with a programming task when you're done writing the code; you're finished when the code runs correctly and with acceptable performance. Testing (covered in "Testing" on page 452) means verifying that code runs correctly by exercising the code under known conditions and checking that results are as expected. Debugging (covered in "Debugging" on page 461) means discovering causes of incorrect behavior and repairing them (repair is often easy, once you figure out the causes).
Optimizing (covered in "Optimization" on page 474) is often used as an umbrella term for activities meant to ensure acceptable performance. Optimizing breaks down into benchmarking (measuring performance for given tasks to check that it's within acceptable bounds), profiling (instrumenting the program to identify performance bottlenecks), and optimizing proper (removing bottlenecks to make overall program performance acceptable). Clearly, you can't remove performance bottlenecks until you've found out where they are (using profiling), which in turn requires knowing that there are performance problems (using benchmarking).
This chapter covers the three subjects in the natural order in which they occur in development: testing first and foremost, debugging next, and optimizing last. However, most programmers' enthusiasm focuses on optimization: testing and debugging are often (wrongly, in my opinion) perceived as being chores, while optimization is perceived as being fun. As a consequence, if you were to read only one section of the chapter, I would suggest that section be "Developing a Fast-Enough Python Application" on page 474, which summarizes the Pythonic approach to optimization—close to Jackson's classic "Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet."
All of these tasks are large and important, and each could fill at least a book by itself. This chapter does not even come close to exploring every related technique and implication; it focuses on Python-specific techniques, approaches, and tools.
In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and many more distinctions could be drawn, but I focus on the issues of most immediate importance to software developers. Many developers are reluctant to spend time on testing, seeing it as time subtracted from "real" development, but this attitude is short-sighted: defects are easier to fix the earlier you find out about them, so each hour spent developing tests can amply pay back for itself by finding defects ASAP, thus saving many hours of debugging that would otherwise have been needed in later phases of the software development cycle.
Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional or integration testing) involves running an entire program with known inputs. Some classic books on testing also draw the distinction between white-box testing, done with knowledge of a program's internals, and black-box testing, done without such knowledge. This classic viewpoint parallels, but does not exactly duplicate, the modern one of unit versus system testing.
Unit and system testing serve different goals. Unit testing proceeds apace with development; you can and should test each unit as you're developing it. One modern approach is known as test-driven development (TDD): for each feature that your program must have, you first write unit tests and only then do you proceed to write code that implements the feature. TDD may seem upside-down, but it has several advantages. For example, it ensures that you won't omit unit tests for some feature. Further, developing test-first is helpful because it urges you to focus first on the tasks a certain function, class, or method should accomplish, and to deal only afterward with how to implement that function, class, or method. A recent important innovation along the lines of TDD is behavior-driven development; see http://behavior-driven.org/ for all details about this new and promising development methodology.
In order to test a unit, which may depend on other units not yet fully developed, you often have to write stubs, which are fake implementations of various units' interfaces that give known and correct responses in cases needed to test other units. The Mock module (http://python-mock.sourceforge.net/) can be very helpful in the implementation of many such stubs (a good tutorial, with links to other useful documents about Mock Objects, is at http://xper.org/wiki/seminar/PythonMockObjectTutorial).
System testing comes afterward, since it requires the system to exist, with some subset of system functionality believed to be in working condition. System testing provides a sanity check: given that each module in the program works properly (passes unit tests), does the whole program work? If each unit is okay but the system as a whole is not, there is a problem in the integration between units, meaning the way the units cooperate. For this reason, system testing is also known as integration testing.
System testing is similar to running the system in production use, except that you fix the inputs in advance so that any problems you may find are easy to reproduce. The cost of failures in system testing is lower than in production use, since outputs from system testing are not used to make decisions, serve customers, control external systems, and so on. Rather, outputs from system testing are systematically compared with the outputs that the system should produce given the known inputs. The purpose is to find, in a cheap and reproducible way, any discrepancy between what the program should do and what the program actually does.
Failures discovered by system testing, just like system failures in production use, may reveal some defects in unit tests, as well as defects in the code. Unit testing may have been insufficient: a module's unit tests may have failed to exercise all needed functionality of that module. In this case, the unit tests clearly need to be beefed up. Do that before you change your code to fix the problem, then run the newly strengthened unit tests to confirm that they do now show the existence of the problem. Then fix the problem and run unit tests again to confirm they show no problem anymore. Finally, rerun the system tests to confirm that the problem has indeed gone away.
Most often, failures in system testing reveal communication problems within the development team: a module correctly implements a certain functionality, but another module expects different functionality. This kind of problem (an integration problem in the strict sense) is hard to pinpoint in unit testing. In good development practice, unit tests must run often, so it is crucial that they run fast. It's therefore essential that each unit can assume other units are working correctly and as expected.
Unit tests that are run in reasonably late stages of development can reveal integration problems if the system architecture is hierarchical, a common and reasonable organization. In such an architecture, lower-level modules depend on no others (except perhaps library modules, which you can assume to be correct), and thus the unit tests of such lower-level modules, if complete, suffice to assure correctness. Higher-level modules depend on lower-level ones, and therefore also depend on correct understanding about what functionality each module expects and supplies. Running complete unit tests on higher-level modules, using the true lower-level modules rather than stubs, exercises the interface between modules, as well as the higher-level modules' own code.
Unit tests for higher-level modules are thus run in two ways. You run the tests with stubs for the lower levels during the early stages of development when the lower-level modules are not yet ready or, later, when you need to check only the correctness of the higher levels. During later stages of development, you also regularly run the higher-level modules' unit tests using the true lower-level modules. In this way, you check the correctness of the whole subsystem, from the higher levels downward. Nevertheless, even in this favorable case, you still need to write and run system tests to make sure the whole system's functionality is exercised and checked, and that no interface between modules is neglected.
System testing is similar to running the program in normal ways. You need special support only to ensure that known inputs are supplied and that outputs are captured for comparison with expected outputs. This is easy for programs that perform I/O on files, but terribly hard for programs whose I/O relies on a GUI, network, or other communication with independent external entities. To simulate such external entities and make them predictable and entirely observable, you generally need platform-dependent infrastructure. Another useful piece of supporting infrastructure for system testing is a testing framework that automates the running of system tests, including logging of successes and failures. Such a framework can also help testers prepare sets of known inputs and corresponding expected outputs.
Both free and commercial programs for these purposes exist, but they are not dependent on which programming languages are used in the system under test. System testing is a close kin to what was classically known as black-box testing, or testing that is independent from the implementation of the system under test (and therefore, in particular, independent from the programming languages used for implementation). Instead, testing frameworks usually depend on the operating system platform on which they run, since the tasks they perform are platform-dependent: running programs with given inputs, capturing their outputs, and particularly simulating and capturing GUI, network, and other interprocess communication I/O. Since frameworks for system testing depend on the platform and not on programming languages, I do not cover them further in this book.
The doctest
module has the primary purpose of letting you create good usage
examples in your code's docstrings by checking that the examples do
in fact produce the results that your docstrings show for them.
doctest recognizes such examples by
looking within the docstring for the interactive Python prompt
'>>> ', followed on the same
line by a Python statement, and the statement's expected output on
the next line.
As you're developing a module, keep the docstrings up to date
and gradually enrich them with examples. Each time part of the
module (e.g., a function) is ready, or even partially ready, make
it a habit to add examples to the docstrings. Import the module
into an interactive session, and interactively use the parts you
just developed in order to provide examples with a mix of typical
cases, limit cases, and failing cases. For this specific purpose
only, use from module import
* so that your examples don't prefix module . to each name the module supplies. Copy and paste
the text of the interactive session into the docstring in an
editor, adjust any mistakes, and you're almost done.
Your documentation is now enriched with examples, and readers
will have an easier time following it, assuming you choose a good
mix of examples and season it wisely with nonexample text. Make
sure you have docstrings, with examples, for your module as a
whole, and for each function, class, and method that the module
exports. You may choose to skip functions, classes, and methods
whose names start with _, since, as
their names indicate, they're meant to be private implementation
details; doctest by default ignores
them, and so should most readers of your module.
Examples that don't match the way your code works are worse than
useless. Documentation and comments are useful only if they match
reality. Docstrings and comments often get out of date as code
changes, and then they become misinformation, hampering rather than
helping any reader of the source. Better to have no comments and
docstrings at all than to have ones that lie. doctest can help you through the examples in your
docstrings. A failing doctest run
should prompt you to review the docstring that contains the failing
examples, reminding you to keep the docstring's text updated,
too.
At the end of your module's source, insert the following small snippet:
if _ _name_ _ == '_ _main_ _':
import doctest
doctest.testmod( )
This code calls function testmod of
module doctest when you run your
module as the main program. testmod
examines all relevant docstrings (the module docstring, and docstrings of all public functions,
public classes, and public methods of public classes). In each
docstring, testmod finds all examples
(by looking for occurrences of the interpreter prompt '>>> ', possibly preceded by whitespace)
and runs each example. testmod checks
that each example's results are equal to the output given in the
docstring right after the example. In the case of exceptions,
testmod ignores the traceback, and
just checks that the expected and observed error messages are
equal.
When everything goes right, testmod
terminates silently. Otherwise, it outputs detailed messages about
examples that failed, showing expected and actual output. Example 18-1 shows a
typical example of doctest at work on
a module mod.py.
Example 18-1. Using doctest
"""
This module supplies a single function reverseWords that reverses a string by words.
>>> reverseWords('four score and seven years')
'years seven and score four'
>>> reverseWords('justoneword')
'justoneword'
>>> reverseWords('')
''
You must call reverseWords with one argument, and it must be a string:
>>> reverseWords( )
Traceback (most recent call last):
...
TypeError: reverseWords( ) takes exactly 1 argument (0 given)
>>> reverseWords('one', 'another')
Traceback (most recent call last):
...
TypeError: reverseWords( ) takes exactly 1 argument (2 given)
>>> reverseWords(1)
Traceback (most recent call last):
...
AttributeError: 'int' object has no attribute 'split'
>>> reverseWords(u'however, unicode is all right too')
u'too right all is unicode however,'
As a side effect, reverseWords eliminates any redundant spacing:
>>> reverseWords('with redundant spacing')
'spacing redundant with'
"""
def reverseWords(astring):
words = astring.split( )
words.reverse( )
return ' '.join(words)
if _ _name_ _=='_ _main_ _':
import doctest, sys
doctest.testmod(sys.modules[_ _name_ _])
In this module's docstring, I have snipped the tracebacks from
the docstring and replaced them with an ellipsis: this is good
common practice, since doctest ignores
tracebacks and they add nothing to the explanatory value of each
failing case. Apart from this snipping, the docstring is the copy
and paste of an interactive session, with the addition of some
explanatory text and empty lines for readability. Save this source
as mod.py, and then run it
with python mod.py. It
produces no output, meaning that all examples work right. Try
python mod.py -v to get an
account of all tests tried and a verbose summary at the end.
Finally, alter the example results in the module docstring, making
them incorrect, to see the messages doctest provides for errant examples.
While doctest is not primarily
meant for general-purpose unit testing, it can nevertheless be a
very convenient tool for the purpose. The recommended way to do
unit testing in Python is with module unittest, covered in "The unittest Module" on page 457.
However, unit testing with doctest can
be easier and faster to set up, since it requires little more than
copy and paste from an interactive session. If you need to maintain
a module that lacks unit tests, retrofitting such tests into the
module with doctest is a reasonable
compromise. It's certainly better to have doctest-based unit tests than not to have any unit
tests at all, as might otherwise happen should you decide that
setting up tests "properly" with unittest would take you too long.
If you do decide to use doctest for
unit testing, don't cram extra tests into your module's docstrings.
This would damage the docstrings by making them too long and hard
to read. Keep in the docstrings the right amount and kind of
examples, strictly for explanatory purposes, just as if unit
testing was not in the picture. Instead, put the extra tests into a
global variable of your module, a dictionary named _ _test_ _. The keys in _
_test_ _ are strings used as arbitrary test names, and the
corresponding values are strings that doctest picks up and uses in just the same way as
it uses docstrings. The values in _ _test_
_ may also be function and class objects, in which case
doctest examines their docstrings for
tests to run. This latter feature is a convenient way to run
doctest on objects with private names,
which doctest skips by default.
In Python 2.4, the doctest module
also supplies two functions that return instances of the
unittest.TestSuite class based on
doctests so that you can integrate such tests into testing
frameworks based on unittest. The
documentation for this advanced functionality is at http://docs.python.org/lib/doctest-unittest-api.html.
The unittest
module is the Python version of a unit-testing framework originally
developed by Kent Beck for Smalltalk. Similar, widespread versions
of the framework also exist for many other programming languages
(e.g., the JUnit package for Java) and
are often collectively referred to as xUnit.
To use unittest, you don't put your
testing code in the same source file as the tested module, but
instead write a separate test module for each module you're
testing. A popular convention is to name the test module the same
as the module being tested, with a prefix such as 'test_', and put it in a subdirectory named
test of the directory where
you keep your sources. For example, the test module for
mod.py can be test/test_mod.py. You need a simple and
consistent naming convention to make it easy for you to write and
maintain auxiliary scripts that find and run all unit tests for a
package.
Separation between a module's source code and its unit-testing code lets you refactor the module more easily, including possibly recoding its functionality in C, without perturbing the unit-testing code. Knowing that test_mod.py stays intact, whatever changes you make to mod.py, enhances your confidence that passing the tests in test_mod.py indicates that mod.py still works correctly after the changes.
A unit-testing module defines one or more subclasses of
unittest's TestCase class. Each subclass specifies one or
more test cases by defining test-case
methods, which are methods that are callable without
arguments and whose names start with test. The subclass may also override method
setUp, which the framework calls to
prepare a new instance just before calling each test-case method,
and tearDown, which the framework
calls to clean things up just after calling each test-case method.
Each test-case method calls methods of class TestCase whose names start with assert in order to express the conditions that the
test must meet. unittest runs the
test-case methods within a TestCase
subclass in arbitrary order, each on a new instance of the
subclass, running setUp just before
each test case and tearDown just after
each test case.
unittest provides other facilities,
such as grouping test cases into test suites, and other more
advanced functionality. You do not need such extras unless you're
defining a custom unit-testing framework or, at the very least,
structuring complicated testing procedures for equally complicated
packages. In almost all cases, the concepts and details covered in
this section are sufficient to perform effective and systematic
unit testing. Example 18-2 shows how
to use unittest to provide unit tests
for the module mod.py of
Example 18-1. For
illustration purposes, this example uses unittest to perform exactly the same tests that
Example 18-1 uses as
examples in docstrings using doctest.
Example 18-2. Using unittest
""" This module tests function reverseWords provided by module mod.py. """
import unittest import mod
class ModTest(unittest.TestCase):
def testNormalCase(self):
self.assertEqual(mod.reverseWords('four score and seven years'),
'years seven and score four')
def testSingleWord(self):
self.assertEqual(mod.reverseWords('justoneword'), 'justoneword')
def testEmpty(self):
self.assertEqual(mod.reverseWords(''), '')
def testRedundantSpacing(self):
self.assertEqual(mod.reverseWords('with redundant spacing'),
'spacing redundant with')
def testUnicode(self):
self.assertEqual(mod.reverseWords(u'unicode is all right too'),
u'too right all is unicode')
def testExactlyOneArgument(self):
self.assertRaises(TypeError, mod.reverseWords)
self.assertRaises(TypeError, mod.reverseWords, 'one', 'another')
def testMustBeString(self):
self.assertRaises((AttributeError,TypeError), mod.reverseWords, 1)
if _ _name_ _= ='_ _main_ _':
unittest.main( )
Running this script with python
test_mod.py is a bit more verbose than using
python mod.py to run
doctest, as in Example 18-1.
test_mod.py outputs a
. (dot) for each test-case method it
runs, then a separator line of dashes, and finally a summary line,
such as "Ran 7 tests in 0.110s," and a final line of "OK" if every
test passed.
Each test-case method makes one or more calls to methods whose
names start with assert (or their
synonyms whose names start with fail).
Here, method testExactlyOneArgument is
the only one with two such calls. In more complicated cases,
multiple calls to assert methods from a single test-case method are
quite common.
Even in a case as simple as this, one minor aspect shows that,
for unit testing, unittest is more
powerful and flexible than doctest. In
method testMustBeString, we pass as
the first argument to assertRaises a
pair of exception classes, meaning we accept either kind of
exception. test_mod.py
therefore accepts as valid multiple implementations of mod.py. It accepts the implementation in
Example 18-1, which
tries calling method split on its
argument, and therefore raises AttributeError when called with an argument that
is not a string. However, it also accepts a different hypothetical
implementation, one that raises TypeError instead when called with an argument of
the wrong type. It would be possible to code this testing
functionality with doctest, but it
would be awkward and nonobvious, while unittest makes it simple and natural.
This kind of flexibility is crucial for real-life unit tests,
which to some extent are executable specifications for their
modules. You could, pessimistically, view the need for flexibility
as indicating that the interface of the code you're testing is not
well defined. However, it's best to view the interface as being
defined with a useful amount of flexibility for the implementer:
under circumstance X
(argument of invalid type passed to function reverseWords, in this example), either of two
things (raising AttributeError or
TypeError) is allowed to happen.
Thus, implementations with either of the two behaviors are correct, and the implementer can choose between them on the basis of such considerations as performance and clarity. By viewing unit tests as executable specifications for their modules (the modern view, and the basis of test-first development) rather than as white-box tests strictly constrained to a specific implementation (as in some traditional taxonomies of testing), the tests become a more vital component of the software development process.
With unittest,
you write test cases by subclassing class TestCase and adding methods, callable without
arguments, whose names start with test. Such test-case methods, in turn, call
methods that your subclass inherits from TestCase, whose names start with assert (or their synonyms, whose names start with
fail), to indicate conditions that
must hold for the test to succeed.
Class TestCase also defines two
methods that your subclass can optionally override in order to
group actions to perform right before and right after each
test-case method runs. This doesn't exhaust TestCase's functionality, but you won't need the
rest unless you're developing testing frameworks or performing some
similarly advanced task. The frequently called methods in a
TestCase instance t are the following.
Unit tests must be fast, since they are run
frequently during development. Therefore, it's best to unit-test
each aspect of your modules' functionality on small amounts of data
when possible. This makes each unit test faster and lets you
conveniently embed all needed data in the test's source code. When
you test a function that reads from or writes to a file object, in
particular, you normally use an instance of class cStringIO (covered in "The StringIO and cStringIO
Modules" on page 229) to simulate a file object while holding
the data in memory; this approach is faster than writing to disk
and saves you the bother of having to remove disk files after your
tests have run.
However, in some rare cases, it may be impossible to fully
exercise a module's functionality without supplying and/or
comparing data in quantities larger than can be reasonably embedded
in a test's source code. In such cases, your unit test will have to
rely on auxiliary, external datafiles to hold the data it needs to
supply to the module it tests and/or the data it needs to compare
to the tested module's output. Even then, you're generally better
off reading the data into instances of cStringIO rather than directing the tested module
to perform actual disk I/O. Even more important, I strongly suggest
that you generally use stubs to test modules meant to interact with
other external entities, such as a database, a GUI, or some other
program over a network. It's easier for you to control all aspects
of the test when using stubs rather than real external entities.
Also, to reiterate, the speed at which you can run unit tests is
important, and it's invariably faster to perform simulated
operations in stubs, rather than real operations.
Since Python's development cycle is so fast, the most effective
way to debug is often to edit your code so that it outputs relevant
information at key points. Python has many ways to let your code
explore its own state in order to extract information that may be
relevant for debugging. The inspect
and traceback modules specifically
support such exploration, which is also known as reflection or
introspection.
Once you have obtained debugging-relevant information, the
print statement is often the simplest
way to display it. You can also log debugging information to files.
Logging is particularly useful for programs that run unattended for
a long time, such as server programs. Displaying debugging
information is just like displaying other kinds of information, as
covered in Chapters 10
and 17. Logging such
information is mostly like writing to files (as covered in Chapter 10) or
otherwise persisting information, as covered in Chapter
11; however, to help with the specific task of logging,
Python's standard library also supplies a logging module, covered in "The logging module" on page 136. As
covered in excepthook on page 168,
rebinding attribute excepthook of
module sys lets your program log
detailed error information just before your program is terminated
by a propagating exception.
Python also offers hooks that enable interactive debugging.
Module pdb supplies a simple text-mode
interactive debugger. Other interactive debuggers for Python are
part of integrated development environments (IDEs), such as IDLE
and various commercial offerings. However, I do not cover IDEs in
this book.
Before you embark on possibly lengthy debugging explorations, make sure you have thoroughly checked your Python sources with the tools mentioned in Chapter 3. Such tools can catch only a subset of the bugs in your code, but they're much faster than interactive debugging, and so their use amply repays itself.
Moreover, again before starting a debugging session, make sure that all the code involved is well covered by unit tests, as seen at "Unit Testing and System Testing" on page 452. Once you have found a bug, before you fix it, add to your suite of unit tests (or, if needed, to the suite of system tests) a test or two that would have found the bug if they had been present from the start, and run the tests again to confirm that they now do reveal and isolate the bug; only once that is done should you proceed to fix the bug. By regularly following this procedure, you will soon have a much better suite of tests, learn to write better tests, and gain much sounder assurance about the overall correctness of your code.
Remember, even with all the facilities offered by Python, its standard library, and whatever IDEs you fancy, debugging is still hard. Take this fact into account even before you start designing and coding: write and run plenty of unit tests and keep your design and code simple, so as to reduce to the absolute minimum the amount of debugging you will need! The classic advice in this regard was phrased by Brian Kernighan as follows: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
The inspect module supplies
functions to get information from all kinds of objects, including
the Python call stack (which records all function calls currently
executing) and source files. The most frequently used functions of
module inspect are as follows.
Suppose that somewhere in your program you execute a statement such as:
x.f( )
and unexpectedly receive an AttributeError informing you that object
x has no attribute named
f. This means that object
x is not as you expected,
so you want to determine more about x as a preliminary to ascertaining
why x is that way and
what you should do about it. Change the statement to:
try: x.f( )
except AttributeError:
import sys, inspect
print>>sys.stderr, 'x is type %s, (%r)' % (type(x), x)
print>>sys.stderr, "x's methods are:",
for n, v in inspect.getmembers(x, callable):
print>>sys.stderr, n,
print>>sys.stderr
raise
This example uses sys.stderr
(covered in stdin, stdout, stderr on
page 171), since it displays information related to an error, not
program results. Function getmembers
of module inspect obtains the name of
all the methods available on x in order to display them. If you
need this kind of diagnostic functionality often, package it up
into a separate function, such as:
import sys, inspect def show_obj_methods(obj, name, show=sys.stderr.write):
show('%s is type %s(%r)\n'%(name,obj,type(obj)))
show("%s's methods are: "%name)
for n, v in inspect.getmembers(obj, callable):
show('%s '%n)
show('\n')
And then the example becomes just:
try: x.f( )
except AttributeError:
show_obj_methods(x, 'x')
raise
Good program structure and organization are just as necessary in code intended for diagnostic and debugging purposes as they are in code that implements your program's functionality. See also "The _ _debug_ _ built-in variable" on page 138 for a good technique to use when defining diagnostic and debugging functions.
The traceback
module lets you extract, format, and output information about
tracebacks as normally produced by uncaught exceptions. By default,
module traceback reproduces the
formatting Python uses for tracebacks. However, module traceback also lets you exert fine-grained
control. The module supplies many functions, but in typical use you
need only one of them.
The pdb module
exploits the Python interpreter's debugging and tracing hooks to
implement a simple command-line-oriented interactive debugger.
pdb lets you set breakpoints,
single-step on sources, examine stack frames, and so on.
To run some code under pdb's
control, import pdb and then call
pdb.run, passing as the single
argument a string of code to execute. To use pdb for post-mortem debugging (meaning debugging
of code that just terminated by propagating an exception at an
interactive prompt), call pdb.pm( )
without arguments. When pdb starts, it
first reads text files named .pdbrc in your home directory and in the
current directory. Such files can contain any pdb commands, but most often they use the
alias command in order to define
useful synonyms and abbreviations for other commands.
When pdb is in control, it prompts
you with the string '(Pdb) ', and you
can enter pdb commands. Command
help (which you can also enter in the
abbreviated form h) lists all
available commands. Call help with an
argument (separated by a space) to get help about any specific
command. You can abbreviate most commands to the first one or two
letters, but you must always enter commands in lowercase:
pdb, like Python itself, is
case-sensitive. Entering an empty line repeats the previous
command. The most frequently used pdb
commands are the following.
IDLE, the Interactive DeveLopment Environment
that comes with Python, offers debugging functionality similar to
that of pdb, although not quite as
powerful. Thanks to IDLE's GUI, the functionality is easier to
access. For example, instead of having to ask for source lists and
stack lists explicitly with such pdb
commands as list and where, you just activate one or more of four
checkboxes in the Debug Control window to see source, stack,
locals, and globals always displayed in the same window at each
step.
To start IDLE's interactive debugger, use Debug → Debugger
in IDLE's *Python Shell* window. IDLE opens the Debug Control
window, outputs [DEBUG ON] in the
shell window, and gives you another >>> prompt in the shell window. Keep
using the shell window as you normally would; any command you give
at the shell window's prompt now runs under the debugger. To
deactivate the debugger, use Debug → Debugger again; IDLE then
toggles the debug state, closes the Debug Control window, and
outputs [DEBUG OFF] in the shell
window. To control the debugger when the debugger is active, use
the GUI controls in the Debug Control window. You can toggle the
debugger away only when it is not busy actively tracking code;
otherwise, IDLE disables the Quit button in the Debug Control
window.
Warnings are messages
about errors or anomalies that may not be serious enough to be
worth disrupting the program's control flow (as would happen by
raising a normal exception). The warnings module affords fine-grained control over
which warnings are output and what happens to them. You can
conditionally output a warning by calling function warn in module warnings. Other functions in the module let you
control how warnings are formatted, set their destinations, and
conditionally suppress some warnings (or transform some warnings
into exceptions).
Module warnings supplies several
exception classes that represent warnings. Class Warning subclasses Exception and is the base class for all warnings.
You may define your own warning classes; they must subclass
Warning, either directly or via one of
its other existing subclasses, which are:
DeprecationWarning-
Uses deprecated features supplied only for backward compatibility
RuntimeWarning-
Uses features whose semantics are error-prone
SyntaxWarning-
Uses features whose syntax is error-prone
UserWarning-
Other user-defined warnings that don't fit any of the above cases
Python supplies no concrete warning objects. A warning is
composed of a message (a
text string), a category
(a subclass of Warning), and two
pieces of information that identify where the warning was raised
from: module (name of the
module that raised the warning) and lineno (line number of the source
code line that raised the warning). Conceptually, you may think of
these as attributes of a warning object w, and I use attribute notation
later for clarity, but no specific warning object w actually exists.
At any time, module warnings keeps a list of active filters for
warnings. When you import warnings for
the first time in a run, the module examines sys.warnoptions to determine the initial set of
filters. You can run Python with option -W to set sys.warnoptions for a given run. Do not rely on
the initial set of filters being held specifically in sys.warnoptions, as this is an implementation
aspect that may change in future releases of Python.
As each warning w
occurs, warnings tests w against each filter until a
filter matches. The first matching filter determines what happens
to w. Each filter is a
tuple of five items. The first item, action, is a string that defines
what happens on a match. The other four items, message, category, module, and
lineno, control what it
means for w to match the
filter, and all conditions must be satisfied for a match. Here are
the meanings of these items (using attribute notation to indicate
conceptual attributes of w):
message-
A regular expression object; the match condition is
message.match(w.message)(the match is case-insensitive). category-
Warningor a subclass ofWarning; the match condition isissubclass(w.category,category). module-
A regular expression object; the match condition is
module.match(w.module)(the match is case-sensitive). lineno-
An
int; the match condition islinenoin (0,w.lineno).linenois0, meaningw.linenodoes not matter, orw.linenomust exactly equallineno.
Upon a match, the first field of the filter, the action, determines what
happens:
'always'-
w.messageis output whether or notwhas already occurred. 'default'-
w.messageis output if, and only if, this is the first timewoccurs from this specific location (i.e., this specificw.module,w.locationpair). 'error'-
w.category(w.message)is raised as an exception. 'ignore'-
wis ignored. 'module'-
w.messageis output if, and only if, this is the first timewoccurs fromw.module. 'once'-
w.messageis output if, and only if, this is the first timewoccurs from any location.
"First make it work. Then make it right. Then make it fast." This quotation, often with slight variations, is widely known as "the golden rule of programming." As far as I've been able to ascertain, the quotation is by Kent Beck, who credits his father with it. Being widely known makes the principle no less important, particularly because it's more honored in the breach than in the observance. A negative form, slightly exaggerated for emphasis, is in a quotation by Don Knuth (who credits Hoare with it): "Premature optimization is the root of all evil in programming."
Optimization is premature if your code is not working yet, or if you're not sure about what, exactly, your code should be doing (since then you cannot be sure if it's working). First make it work. Optimization is also premature if your code is working but you are not satisfied with the overall architecture and design. Remedy structural flaws before worrying about optimization: first make it work, then make it right. These first two steps are not optional; working, well-architected code is always a must.
In contrast, you don't always need to make it fast. Benchmarks may show that your code's performance is already acceptable after the first two steps. When performance is not acceptable, profiling often shows that all performance issues are in a small part of the code, perhaps 10 to 20 percent of the code where your program spends 80 or 90 percent of the time. Such performance-crucial regions of your code are known as its bottlenecks, or hot spots. It's a waste of effort to optimize large portions of code that account for, say, 10 percent of your program's running time. Even if you made that part run 10 times as fast (a rare feat), your program's overall runtime would only decrease by 9 percent, a speedup no user would even notice. If optimization is needed, focus your efforts where they'll matter—on bottlenecks. You can optimize bottlenecks while keeping your code 100 percent pure Python, thus not preventing future porting to other Python implementations. In some cases, you can resort to recoding some computational bottlenecks as Python extensions (as covered in Chapter 25), potentially gaining even better performance (possibly at the expense of some potential future portability).
Start by designing, coding, and testing your application in Python, using available extension modules if they save you work. This takes much less time than it would with a classic compiled language. Then benchmark the application to find out if the resulting code is fast enough. Often it is, and you're done—congratulations! Ship it!
Since much of Python itself is coded in highly optimized C, as are many of its standard and extension modules, your application may even turn out to be already faster than typical C code. However, if the application is too slow, you need to reexamine your algorithms and data structures. Check for bottlenecks due to application architecture, network traffic, database access, and operating system interactions. For typical applications, each of these factors is more likely than language choice to cause slowdowns. Tinkering with large-scale architectural aspects can often speed up an application dramatically, and Python is an excellent medium for such experimentation.
If your program is still too slow, profile it to find out where the time is going. Applications often exhibit computational bottlenecks: small areas of the source code, often between 10 and 20 percent, which account for 80 percent or more of the running time. Then optimize the bottlenecks, applying the techniques suggested in the rest of this chapter.
If normal Python-level optimizations still leave some outstanding computational bottlenecks, you can recode them as Python extension modules, as covered in Chapter 25. In the end, your application will run at roughly the same speed as if you had coded it all in C, C++, or Fortran—or faster, when large-scale experimentation has let you find a better architecture. Your overall programming productivity with this process is not much less than if you coded everything in Python. Future changes and maintenance are easy, since you use Python to express the overall structure of the program and lower-level, harder-to-maintain languages for only a few specific computational bottlenecks.
As you build applications in a given area according to this process, you accumulate a library of reusable Python extension modules. You therefore become more and more productive at developing other fast-running Python applications in the same field.
Even if external constraints eventually force you to recode the whole application in a lower-level language, you're still better off for having started in Python. Rapid prototyping has long been acknowledged as the best way to get software architecture just right. A working prototype lets you check that you have identified the right problems and taken a good path to their solution. A prototype also affords the kind of large-scale architectural experiments that can make a real difference to performance. Starting your prototype with Python allows a gradual migration to other languages by way of extension modules. The application remains fully functional and testable at each stage. This ensures against the risk of compromising a design's architectural integrity in the coding stage. The resulting software is faster and more robust than if all of the coding had been lower-level from the start, and your productivity, while not quite as good as with a pure Python application, is still better than if you had been coding at a lower level throughout.
Benchmarking (also known as load testing) is similar to system testing: both activities are much like running the program for production purposes. In both cases, you need to have at least some subset of the program's intended functionality working, and you need to use known, reproducible inputs. For benchmarking, you don't need to capture and check your program's output: since you make it work and make it right before you make it fast, you are already fully confident about your program's correctness by the time you load-test it. You do need inputs that are representative of typical system operations, ideally ones that may be most challenging for your program's performance. If your program performs several kinds of operations, make sure you run some benchmarks for each different kind of operation.
Elapsed time as measured by your wristwatch is probably precise enough to benchmark most programs. Programs with hard real-time constraints are obviously another matter, but they have needs very different from those of normal programs in most respects. A 5 or 10 percent difference in performance, except for programs with very peculiar constraints, makes no practical difference to a program's real-life usability.
When you benchmark "toy" programs or snippets in order to help
you choose an algorithm or data structure, you may need more
precision: the timeit module of
Python's standard library (mentioned in "Module timeit" on page 484) is quite suitable
for such tasks. The benchmarking discussed in this section is of a
different kind: it is an approximation of real-life program
operation for the sole purpose of checking whether the program's
performance at each task is acceptable, before embarking on
profiling and other optimization activities. For such system
benchmarking, a situation that approximates the program's normal
operating conditions is best, and high accuracy in timing is not
particularly important.
The aspects of your program that are most important for performance are large-scale ones: choice of algorithms, overall architecture, choice of data structures.
The performance issues that you must often take into account are
those connected with the traditional big-O notation of computer
science. Informally, if you call N the
input size of an algorithm, big-O notation expresses algorithm
performance, for large values of N, as
proportional to some function of N (in
precise computer science lingo, this should be called big-Theta,
but in real life, most programmers call this big-O, perhaps because
a Greek uppercase Theta looks like an O with a dot in the
center!).
An O(1) algorithm (also known as a
"constant time" algorithm) is one that takes a time that does not
grow with N. An O(N) algorithm (also known as a "linear time"
algorithm) is one where, for large enough N, handling twice as much data takes about twice
as much time, three times as much data three times as much time,
and so on, growing proportionally to N. An O(N
2 ) algorithm (also known
as a "quadratic time" algorithm) is one where, for large enough
N, handling twice as much data takes
about four times as much time, three times as much data nine times
as much time, and so on, growing proportionally to N squared. Identical concepts and notation are
used to describe a program's consumption of memory ("space") rather
than of time.
You will find more information on big-O notation, and about algorithms and their complexity, in any good book about algorithms and data structures. Unfortunately, at the time of this writing, there aren't yet any such books that use Python. However, if you are familiar with C, I recommend Mastering Algorithms with C, by Kyle Loudon (O'Reilly).
To understand the practical importance of big-O considerations in your programs, consider two different ways to accept all items from an input iterable and accumulate them into a list in reverse order:
def slow(it):
result = []
for item in it: result.insert(0, item)
return result
def fast(it):
result = []
for item in it: result.append(item)
result.reverse( )
return result
We could express each of these functions more concisely, but the
key difference is best appreciated by presenting the functions in
elementary terms. Function slow builds
the result list by inserting each input item before all previously
received ones. Function fast appends
each input item after all previously received ones, then reverses
the result list at the end. Intuitively, one might think that the
final reversing represents extra work, and therefore slow should be faster than fast. But that's not the way things work out.
Each call to result.append takes
roughly the same amount of time, independent of how many items are
already in list result, since there is
always a free slot for an extra item at the end of the list (in
pedantic terms, append is amortized O(1), but I don't cover amortization in this
book). The for loop in function
fast executes N times to receive N
items. Since each iteration of the loop takes a constant time,
overall loop time is O(N).
result.reverse also takes time O(N), as it is directly proportional to the total
number of items. Thus, the total running time of fast is O(N). (If you
don't understand why a sum of two quantities, each O(N), is also O(N),
consider that the sum of any two linear functions of N is also a linear function of N—and "being O(N)" has exactly the same meaning as "consuming
an amount of time that is a linear function of N.")
In contrast, each call to result.insert must make space at slot 0 for the new item to insert by moving all items
that are already in list result
forward one slot. This takes a time proportional to the number of
items that are already in the list. The overall amount of time to
receive N items is therefore
proportional to 1+2+3+...N-1, a sum
whose value is O(N 2
). Therefore, the total running time
of slow is O(N 2 ).
It's almost always worth replacing an O(N 2 )
solution with an O(N) one, unless you
can somehow assign rigorous small limits to the input size
N. If N
can grow without very strict bounds, the O(N 2 )
solution will inevitably turn out to be disastrously slower than
the O(N) one for large enough values
of N, no matter what the
proportionality constants in each case may be (and no matter what
profiling tells you). Unless you have other O(N 2 ) or
even worse bottlenecks elsewhere that you cannot eliminate, a part
of the program that is O(N
2 ) will inevitably turn
into the program's bottleneck and dominate runtime for large enough
values of N. Do yourself a favor and
watch out for the big O: all other performance issues, in
comparison, are almost always insignificant.
Incidentally, function fast can be
made even faster by expressing it in more idiomatic Python. Just
replace the first two lines with the single statement:
result = list(it)
This change does not affect fast's
big-O character (fast is still
O(N) after the change), but does speed
things up by a large constant factor. Often, in Python, the
simplest, clearest, most idiomatic way to express something is also
the fastest.
Choosing algorithms with good big-O characteristics is roughly the same task in Python as in any other language. You just need a few indications about the big-O performance of Python's elementary building blocks, and I provide them in the following sections.
Python lists are internally implemented as vectors (also known as dynamic arrays), not as "linked lists." This fundamental implementation choice determines just about all performance characteristics of Python lists, in big-O terms.
Chaining two lists L1 and
L2, of length N1 and N2 (i.e.,
L1+L2) is O(N1+N2). Multiplying a list L of length N by the
integer M (i.e., L*M) is O(N*M).
Accessing or rebinding any list item is O(1).
len( ) on a list is also O(1).
Accessing any slice of length M is
O(M). Rebinding a slice of length
M with one of identical length is also
O(M). Rebinding a slice of length
M1 with one of different length
M2 is O(M1+M2+N1), where N1
is the number of items after
the slice in the target list (in other words, such length-changing
slice rebindings are very cheap when they occur at the end of a list, and costly when they
occur at the beginning or
around the middle of a long list). If you need first-in, first-out
(FIFO) operations, a list is probably not the fastest data
structure for the purpose: instead, try type collections.deque, covered in "deque" on
page 173.
Most list methods, as shown back in Table 4-3, are
equivalent to slice rebindings and have the same big-O performance.
Methods count, index, remove, and
reverse, and operator in, are O(N). Method
sort is generally O(N*log(N)), but is highly optimized to be
O(N) in some important special cases,
like when the list is already sorted, reverse-sorted, or sorted
except for a few items. range(a,b,c)
is O((b-a)/c). xrange(a,b,c) is
O(1), but looping on xrange's result is O((b-a)/c).
Most methods on a string of length N (be it plain or Unicode) are O(N). len(astring) is O(1). The fastest way to produce a copy of a
string with transliterations and/or removal of specified characters
is the string's method translate. The
single most practically important big-O consideration involving
strings is covered in "Building up a string from
pieces" on page 484.
Python dictionaries are internally implemented with hash tables. This fundamental implementation choice determines just about all performance characteristics of Python dictionaries, in big-O terms.
Accessing, rebinding, adding, or removing a dictionary item is
generally O(1), as are methods
has_key, get, setdefault, and
popitem, and operator in. d1
.update( d2 )
is O(len( d2 )).
len( adict
) is O(1). Methods keys,
items, and values are
O(N). Methods iterkeys, iteritems, and itervalues are O(1),
but looping on the iterators that those methods return is
O(N) (the methods with names that
start with iter do save memory
compared to their counterparts that return lists, which in turn may
make them faster), and looping directly on a dict has the same big-O performance as
iterkeys. Never test with if
x in d.keys( ). That would be O(N), while the equivalent test if x in d: is O(1)
(if d.has_key(x): is also O(1), but is slower than if
x in d: and has no compensating advantage).
When the keys in a dictionary are instances of classes that
define _ _hash_ _ and equality
comparison methods, dictionary performance is of course affected by
those methods. The performance indications presented in this
paragraph hold only when hashing and equality comparison are
O(1).
Python sets, like dictionaries, are internally implemented with hash tables. All performance characteristics of sets are, in big-O terms, the same as those of dictionaries.
Adding or removing a set item is generally O(1), as is operator in.
len( aset
) is O(1). Looping on a set is O(N). When the items in a set are instances of
classes that define _ _hash_ _ and
equality comparison methods, set performance is of course affected
by those methods. The performance indications presented in this
paragraph hold only when hashing and equality comparison are
O(1).
Let L be any list,
T any string (plain or
Unicode); D any dict;
S any set, with (say)
numbers as items (with O(1) hashing
and comparison) and x any
number:
O(1)-
len(L), len(T), len(D), len(S),L[i],T[i],D[i], del D[i], ifxin D, ifxinS, S.add(x),S.remove(x), additions or removals to/from the right end ofL O(N)-
Loops on
L, T, D, S, general additions or removals to/fromL(not at the right end), all methods onT,ifxinL,ifxinT, most methods onL, all shallow copies O(N log N)-
L.sortin general (butO(N)ifLis already nearly sorted or reverse-sorted)
Most programs have hot spots (i.e., regions
of source code that account for most of the time elapsed during a
program run). Don't try to guess where your program's hot spots
are: a programmer's intuition is notoriously unreliable in this
field. Use module profile to collect
profile data over one or more runs of your program, with known
inputs. Then use module pstats to
collate, interpret, and display that profile data. To gain
accuracy, you can calibrate the Python profiler for your machine
(i.e., determine what overhead profiling incurs on your
machine). Module profile can then
subtract this overhead from the times it measures so that the
profile data you collect is closer to reality. Python 2.5
introduces a new standard library module cProfile with similar functionality to
profile; cProfile is preferable, since it's faster, which
imposes less overhead. Yet another profiling module in Python's
standard library is hotshot (covered
at http://docs.python.org/lib/module-hotshot.html and
present since Python 2.2); unfortunately, hotshot is not compatible with threads.
To calibrate profile for your
machine, you need to use class Profile, which module profile supplies and internally uses in function
run. An instance p of Profile supplies one method you use for
calibration.
Fine-tuning of program operations is rarely important. Tuning may make a small but meaningful difference in some particularly hot spot, but hardly ever is it a decisive factor. And yet, fine-tuning, in the pursuit of mostly irrelevant micro-efficiencies, is where a programmer's instincts are likely to lead. It is in good part because of this that most optimization is premature and best avoided. The most that can be said in favor of fine-tuning is that, if one idiom is always speedier than another when the difference is measurable, it's worth getting into the habit of always using the former and not the latter.
Most often, in Python, if you do what comes naturally and choose simplicity and elegance, you end up with code that has good performance as well as clarity and maintainability. In a few cases, an approach that may not be intuitively preferable still does offer performance advantages, as discussed in the rest of this section.
The simplest possible optimization is to run your Python
programs using python -O or
-OO. -OO makes little direct
difference to performance compared to -O, but -OO may save memory, since it removes
docstrings from the bytecode, and memory availability is sometimes
(indirectly) a performance bottleneck. The optimizer is not very
powerful in current releases of Python, but it may still gain you
performance advantages on the order of 5 percent, sometimes as
large as 10 percent (potentially larger if you make use of
assert statements and if _ _debug_ _: guards, as suggested in "The assert Statement" on page 138). The
best aspect of -O is that it
costs nothing—as long as your optimization isn't premature,
of course (don't bother using -O on a program you're still
developing).
Standard library module timeit is very handy for measuring the precise
performance of specific snippets of code. You can have module
timeit use timeit's functionality in your programs, but the
simplest and most normal use is from the command line:
python -mtimeit -s'setup statement(s)' 'statement(s) to be timed'
For example, say you're wondering about the performance of
x=x+1 versus x+=1. At some command prompt, you can easily
try:
$python -mtimeit -s'x=0' 'x=x+1'1000000 loops, best of 3: 0.25 usec per loop $python -mtimeit -s'x=0' 'x+=1'1000000 loops, best of 3: 0.258 usec per loop
and find out that performance is for all intents and purposes identical in both cases.
The single Python "anti-idiom" that's likeliest to kill your
program's performance, to the point that you should never use it, is to build up a large
string from pieces by looping on string concatenation statements
such as big_string
+= piece. Python strings are
immutable, so each such concatenation means that Python must free
the M bytes previously allocated for
big_string, and allocate
and fill M+K bytes for the new
version. Doing this repeatedly in a loop, you end up with roughly
O(N 2 ) performance, where N is the total number of characters. More often
than not, O(N 2
) performance where O(N) is available is a performance disaster. On
some platforms, things may be even bleaker due to memory
fragmentation effects caused by freeing many memory areas of
progressively larger sizes.
To achieve O(N) performance,
accumulate intermediate pieces in a list rather than build up the
string piece by piece. Lists, unlike strings, are mutable, so
appending to a list has O(1)
performance (amortized). Change each occurrence of big_string += piece
into temp_list
.append( piece ). Then, when you're done accumulating, use the
following to build your desired string result in O(N) time:
big_string= ''.join(temp_list)
Using a list comprehension, generator expression, or other
direct means (such as a call to map,
or use of standard library module itertools) to build temp_list may often offer further
optimization over repeated calls to temp_list .append. Other O(N)
ways to build up big strings, which some Python programmers find
more readable, are to concatenate the pieces to an instance of
array.array('c') with the array's
extend method, or to write the pieces
to an instance of cStringIO.StringIO.
In the special case where you want to output the resulting
string, you may gain a further small slice of performance by using
writelines on temp_list (never building
big_string in memory).
When feasible (i.e., when you have the output file object open and
available in the loop), it's just as effective to perform a
write call for each piece, without any
accumulation.
Although not nearly as crucial as += on a big string in a loop, another case where
removing string concatenation may give a slight performance
improvement is when you're concatenating several values in an
expression:
oneway = str(x)+' eggs and '+str(y)+' slices of '+k+' ham' another = '%s eggs and %s slices of %s ham' % (x, y, k)
Using operator % for string
formatting is often a good performance choice.
Operator in, the most natural tool
for searching, is O(1) when the
righthand side operand is a set or dictionary, but O(N) when the righthand side operand is a string,
list, or tuple. If you need to perform many searches on a
container, you're generally much better off using a set or
dictionary, rather than a list or tuple, as the container. Python
sets and dictionaries are highly optimized for searching and
fetching items by key.
Method sort of Python lists is also
a highly optimized and sophisticated tool. You can rely on
sort's performance. Performance
dramatically degrades, however, if you pass sort a custom callable to perform comparisons in
order to sort a list based on anything but built-in comparisons. To
satisfy such needs, consider using the decorate-sort-undecorate
(DSU) idiom instead. In spelled-out form, this idiom has the
following steps:
- Decorate
-
Build an auxiliary list
Awhere each item is a tuple made up of the sort keys, ending with the item of the original listLor with the item's index. - Sort
-
Call
A.sort( )without arguments. - Undecorate
-
Extract the items in order from the now-sorted
A.
The decorate and undecorate steps are often handily performed
with list comprehensions. If you need the sort to be in-place,
assign the final sorted list to L[:].
Otherwise, DSU provides a sorted copy, without disturbing the
original list L.
For example, say we have in L a
large list of strings, each of at least two words, and we want to
sort L in-place by the second word of
each string:
A = [ (s.split( )[1], s) for s in L ] A.sort( ) L[:] = [ t[1] for t in A ]
This is much faster than passing to L.sort a function that compares two strings by
their second words, as in:
def secondword(s): return s.split()[1] L.sort(key=secondword)
On a series of benchmarks with Python 2.4 on lists of 10,000 strings, I measured the DSU version as about five times faster than the non-DSU one.
A particularly fast and effective way to use DSU is to specify a
named argument key= to sort, a possibility that was introduced in Python
2.4. Module operator supplies
functions attrgetter and itemgetter that are particularly suitable for this
use. The fastest way to perform the task mentioned above is:
import operator L.sort(key=operator.itemgetter(1))
On the same benchmarks, this is another five times faster than the plain DSU version.
Occasionally, you may avoid the need for sorting by using heaps, covered in "The heapq Module" on page 177.
Code in a function runs faster than code at
the top level in a module because access to a function's local
variables is optimized to be very fast. If a function contains an
exec statement without explicit
dictionaries, however, the whole function slows down. The presence
of such an exec statement forces the
Python compiler to avoid the modest but important optimizations it
normally performs regarding access to local variables, since the
exec might cause any alteration at all
to the function's namespace. A from
statement of the form:
fromMyModule import *
wastes performance, too, since it also can alter a function's namespace unpredictably.
exec itself is also quite slow, and
even more so if you apply it to a string of source code rather than
to a code object. By far the best approach—for performance,
for correctness, and for clarity—is to avoid exec altogether. It's most often possible to find
better (faster, more robust, and clearer) solutions. If you must
use exec, always use it with explicit
dictionaries. If you need to exec a
dynamically obtained string more than once, compile the string just once and then repeatedly
exec the resulting code object.
eval works on expressions, not on
statements; therefore, while still slow, it avoids some of the
worst performance impacts of exec.
With eval, too, you're best advised to
use explicit dictionaries. If you need several evaluations of the
same dynamically obtained string, compile the string once and then repeatedly
eval the resulting code object.
See "Dynamic Execution and
the exec Statement" on page 328 for more details and advice
about exec, eval, and compile.
Most of your program's bottlenecks will be in loops, particularly nested loops, because loop bodies tend to execute repeatedly. Python does not implicitly perform any code hoisting: if you have any code inside a loop that might be executed just once by hoisting it out of the loop, and the loop is a performance bottleneck, hoist the code out yourself. Sometimes the presence of code to hoist may not be immediately obvious:
def slower(anobject, ahugenumber):
for i in xrange(ahugenumber): anobject.amethod(i)
def faster(anobject, ahugenumber):
themethod = anobject.amethod
for i in xrange(ahugenumber): themethod(i)
In this case, the code that faster
hoists out of the loop is the attribute lookup anobject.amethod. slower repeats the lookup every
time, while faster performs it just
once. The two functions are not 100 percent equivalent: it is
(barely) conceivable that executing amethod might cause such changes on anobject that the next lookup for the same named
attribute fetches a different method object. This is part of why
Python doesn't perform such optimizations itself. In practice, such
subtle, obscure, and tricky cases happen quite seldom; you're quite
safe performing such optimizations yourself, to squeeze the last
drop of performance out of some crucial bottleneck.
Python is faster with local variables than with global ones. If a loop repeatedly accesses a global whose value does not change between iterations, cache the value in a local variable and have the loop access the local instead. This also applies to built-ins:
def slightly_slower(asequence, adict):
for x in asequence: adict[x] = hex(x)
def slightly_faster(asequence, adict):
myhex = hex
for x in asequence: adict[x] = myhex(x)
Here, the speedup is very modest, on the order of five percent or so.
Do not cache None. None is a
keyword, so no further optimization is needed.
List comprehensions can be faster than loops, and so can
map and filter. For optimization purposes, try changing
loops into list comprehensions or map
and filter calls where feasible. The
performance advantage of map and
filter is nullified if you have to use
a lambda or an extra level of function
call. Only when you pass to map or
filter a built-in function, or a
function you'd have to call anyway even from an explicit loop, do
you stand to gain.
The loops that you can replace most naturally with list
comprehensions, or map and
filter calls, are ones that build up a
list by repeatedly calling append on
the list. The following example shows this optimization in a
micro-performance benchmark script:
import time, operator
def slow(asequence):
result = []
for x in asequence: result.append(-x)
return result
def middling(asequence):
return map(operator.neg, asequence)
def fast(asequence):
return [-x for x in asequence]
biggie = xrange(500*1000)
tentimes = [None]*10
def timit(afunc):
lobi = biggie
start = time.clock( )
for x in tentimes: afunc(lobi)
stend = time.clock( )
return "%-10s: %.2f" % (afunc._ _name_ _, stend-start)
for afunc in slow, middling, fast, fast, middling, slow:
print timit(afunc)
Running this example with Python 2.4 on my laptop shows that
fast takes 3.62 seconds, middling 4.71 seconds, and slow 6.91 seconds. In other words, on this
machine, slow (the loop of
append method calls) is about 47
percent slower than middling (the
single map call), and middling, in turn, is about 30 percent slower than
fast (the list comprehension). The
list comprehension is the most direct way to express the task being
micro-benchmarked in this example, so, not surprisingly, it's also
fastest—almost two times faster than the loop of append method calls.
If your program does substantial amounts of I/O, it's likely that performance bottlenecks are due to I/O, not to computation. Such programs are said to be I/O-bound, rather than CPU-bound. Your operating system tries to optimize I/O performance, but you can help it in a couple of ways. One such way is to perform your I/O in chunks of a size that is optimal for performance, rather than simply being convenient for your program's operations. Another way is to use threading.
From the point of view of a program's convenience and simplicity, the ideal amount of data to read or write at a time is often small (one character or one line) or very large (an entire file at a time). That's often okay: Python and your operating system work behind the scenes to let your program use convenient logical chunks for I/O, while arranging physical I/O operations with chunk sizes more attuned to performance. Reading and writing a whole file at a time is quite likely to be okay for performance as long as the file is not very large. Specifically, file-at-a-time I/O is fine as long as the file's data fits comfortably in physical memory, leaving ample memory available for your program and operating system to perform whatever other tasks they're performing at the same time. The hard problems of I/O-bound performance tend to come with huge files.
If performance is an issue, don't use a file's readline method, which is limited in the amount of
chunking and buffering it can perform. (Using writeline, on the other hand, gives no performance
problem when that method is convenient for your program.) When
reading a text file, loop directly on the file object to get one
line at a time with best performance. If the file isn't too huge,
and so can conveniently fit in memory, time two versions of your
program—one looping directly on the file object, the other
calling readlines to read the whole
file into memory. Either may prove faster.
For binary files, particularly large binary files whose contents
you need just a part of on each run of your program, module
mmap (covered in The
mmap Module" on page 360) can often give you both good
performance and program simplicity.
Making an I/O-bound program multithreaded may sometimes afford
substantial performance gains if you can arrange your program's
architecture accordingly. Start a few worker threads devoted
exclusively to I/O, have the computational threads request I/O
operations from the I/O threads via Queue instances, and post the request for each
input operation as soon as you know you'll eventually need that
data. Performance will increase only if there are other tasks your
computational threads can perform while I/O threads are blocked
waiting for data. Basically, you get better performance this way if
you can manage to overlap computation and waiting for data by
having different threads do the computing and the waiting. (See
"Threads in Python" on page 341 for detailed
coverage of Python threading and a suggested architecture.) On the
other hand, if a substantial fraction of your I/O is on the Net, an
even faster and definitely more scalable approach is to eschew
threads in favor of asynchronous (event-driven) architectures, as
covered in "Event-Driven Socket Programs" on
page 533.
