Chapter 4. Writing Great Code
This chapter focuses on best practices for writing great Python code. We will review coding style conventions that will be used in Chapter 5, and briefly cover logging best practices, plus list a few of the major differences between available open source licenses. All of this is intended to help you write code that is easy for us, your community, to use and extend.
Code Style
Pythonistas (veteran Python developers) celebrate having a language so accessible that people who have never programmed can still understand what a Python program does when they read its source code. Readability is at the heart of Pythonâs design, following the recognition that code is read much more often than it is written.
One reason Python code can be easily understood is its relatively complete set of code style guidelines (collected in the two Python Enhancement Proposals PEPÂ 20 and PEPÂ 8, described in the next few pages) and âPythonicâ idioms. When a Pythonista points to portions of code and says they are not âPythonic,â it usually means that those lines of code do not follow the common guidelines and fail to express the intent in what is considered the most readable way. Of course, âa foolish consistency is the hobgoblin of little minds.â1 Pedantic devotion to the letter of the PEP can undermine readability and understandability.
PEP 8
PEP 8 is the de facto code style guide for Python. It covers naming conventions, code layout, whitespace (tabs versus spaces), and other similar style topics.
This is highly recommended reading. The entire Python community does its best to adhere to the guidelines laid out within this document. Some projects may stray from it from time to time, while others (like Requests) may amend its recommendations.
Conforming your Python code to PEPÂ 8 is generally a
good idea and helps make code more consistent when
working on projects with other developers.
The PEPÂ 8 guidelines are explicit enough that they can be
programmatically checked.
There
is a command-line program, pep8
,
that can check your code for conformity.
Install it by running the following
command in your terminal:
$
pip3 install pep8
Hereâs an example of the kinds of things you might see when you run pep8
:
$ pep8 optparse.py
optparse.py:69:11: E401 multiple imports on one line optparse.py:77:1: E302 expected 2 blank lines, found 1 optparse.py:88:5: E301 expected 1 blank line, found 0 optparse.py:222:34: W602 deprecated form of raising exception optparse.py:347:31: E211 whitespace before '(' optparse.py:357:17: E201 whitespace after '{' optparse.py:472:29: E221 multiple spaces before operator optparse.py:544:21: W601 .has_key() is deprecated, use 'in'
The fixes to most of the complaints are straightforward and stated directly in PEPÂ 8. The code style guide for Requests gives examples of good and bad code and is only slightly modified from the original PEPÂ 8.
The linters referenced in âText Editorsâ usually
use pep8
, so you can also install one of these
to run checks within your editor or IDE.
Or, the program autopep8
can be used to automatically reformat
code in the PEPÂ 8 style. Install the program with:
$
pip3 install autopep8
Use it to format a file in-place (overwriting the original) with:
$
autopep8 --in-place optparse.py
Excluding the --in-place
flag will cause the program to output the modified
code directly to the console for review (or piping to another file). The --aggressive
flag will
perform more substantial changes and can be applied multiple times for greater effect.
PEPÂ 20 (a.k.a. The Zen of Python)
PEPÂ 20,
the set of guiding principles for decision making in Python, is
always available via import this
in a Python shell.
Despite its name, PEPÂ 20 only contains 19 aphorisms, not 20
(the last has not been written downâ¦).
The true history of the Zen of Python is immortalized in Barry Warsawâs blog post âimport this and the Zen of Python.â
For an example of each Zen aphorism in action, see Hunter Blanksâ presentation âPEPÂ 20 (The Zen of Python) by Example.â Raymond Hettinger also put these principles to fantastic use in his talk âBeyond PEP 8: Best Practices for Beautiful, Intelligible Code.â
General Advice
This section contains style concepts that are hopefully easy to accept without debate, and often applicable to languages other than Python. Some of them are direct from the Zen of Python, but others are just plain common sense. They reaffirm our preference in Python to select the most obvious way to present code, when multiple options are possible.
Explicit is better than implicit
While any kind of black magic is possible with Python, the simplest, most explicit way to express something is preferred:
Bad | Good |
---|---|
|
|
In the good code, x
and y
are explicitly received from
the caller, and an explicit dictionary is returned. A good rule
of thumb is that another developer should be able to read the first and last lines of your function and understand what it does. Thatâs not the case with the bad example. (Of course, itâs also pretty easy when the function is only two lines long.)
Sparse is better than dense
Make only one statement per line. Some compound statements, such as list comprehensions, are allowed and appreciated for their brevity and their expressiveness, but it is good practice to keep disjoint statements on separate lines of code. It also makes for more understandable diffs3 when revisions to one statement are made:
Bad | Good |
---|---|
|
|
|
|
|
|
Gains in readability, to Pythonistas, are more valuable than a few bytes of total code (for the two-prints-on-one-line statement) or a few microseconds of computation time (for the extra-conditionals-on-separate-lines statement). Plus, when a group is contributing to open source, the âgoodâ codeâs revision history will be easier to decipher because a change on one line can only affect one thing.
Errors should never pass silently / Unless explicitly silenced
Error handling in Python is done using the try
statement. An example
from Ben Gleitzmanâs HowDoI package (described more
in âHowDoIâ) shows when silencing an error is OK:
def
format_output
(
code
,
args
):
if
not
args
[
'color'
]:
return
code
lexer
=
None
# try to find a lexer using the Stack Overflow tags
# or the query arguments
for
keyword
in
args
[
'query'
]
.
split
()
+
args
[
'tags'
]:
try
:
lexer
=
get_lexer_by_name
(
keyword
)
break
except
ClassNotFound
:
pass
# no lexer found above, use the guesser
if
not
lexer
:
lexer
=
guess_lexer
(
code
)
return
highlight
(
code
,
lexer
,
TerminalFormatter
(
bg
=
'dark'
))
This is part of a package that provides a command-line script
to query the Internet (Stack Overflow, by default) for how
to do a particular coding task, and prints it to the screen. The function
format_output()
applies syntax highlighting by first searching through the
questionâs tags for a string understood by the lexer (also called a tokenizer; a âpythonâ, âjavaâ, or âbashâ tag will identify which lexer to use to split and colorize the code), and then if that fails,
to try inferring the language from the code itself. There are three paths
the program can follow when it reaches the try
statement:
-
Execution enters the
try
clause (everything between thetry
and theexcept
), a lexer is successfully found, the loop breaks, and the function returns the code highlighted with the selected lexer. -
The lexer is not found, the
ClassNotFound
exception is thrown, itâs caught, and nothing is done. The loop continues until it finishes naturally or a lexer is found. -
Some other exception occurs (like a
KeyboardInterrupt
) that is not handled, and it is raised up to the top level, stopping execution.
The âshould never pass silentlyâ part of the zen aphorism discourages the use of overzealous error trapping. Hereâs an example you can try in a separate terminal so that you can kill it more easily once you get the point:
>>>
while
True
:
...
try
:
...
(
"nyah"
,
end
=
" "
)
...
except
:
...
pass
Or donât try it. The except
clause without any specified exception
will catch everything, including KeyboardInterrupt
(Ctrl+C in a POSIX terminal),
and ignore it; so it swallows the dozens of interrupts
you try to give it to shut the thing down.
Itâs not just the interrupt issueâa broad except
clause can
also hide bugs, leaving them to cause some problem later on, when
it will be harder to diagnose. We repeat, donât let errors
pass silently:
always explicitly identify by name the exceptions you will catch,
and handle only those exceptions.
If you simply want to log or otherwise acknowledge the exception
and re-raise it, like in the following snippet, thatâs OK. Just donât let the error pass silently (without handling or re-raising it):
>>> while True: ... try: ... print("ni", end="-") ... except: ... print("An exception happened. Raising.") ... raise
Function arguments should be intuitive to use
Your choices in API design will determine the downstream developerâs experience when interacting with a function. Arguments can be passed to functions in four different ways:
def
func
(
positional
,
keyword
=
value
,
*
args
,
*
*
kwargs
)
:
pass
Positional arguments are mandatory and have no default values.
Keyword arguments are optional and have default values.
An arbitrary argument list is optional and has no default values.
An arbitrary keyword argument dictionary is optional and has no default values.
Here are tips for when to use each method of argument passing:
- Positional arguments
-
Use these when there are only a few function arguments, which are fully part of the functionâs meaning, with a natural order. For instance, in
send(message, recipient)
orpoint(x, y)
the user of the function has no difficulty remembering that those two functions require two arguments, and in which order.Usage antipattern: It is possible to use argument names, and switch the order of arguments when calling functionsâfor example, calling
send(recipient="World", message="The answer is 42.")
andpoint(y=2, x=1)
. This reduces readability and is unnecessarily verbose. Use the more straightforward calls tosend("The answer is 42", "World")
andpoint(1, 2)
. - Keyword arguments
-
When a function has more than two or three positional parameters, its signature is more difficult to remember, and using keyword arguments with default values is helpful. For instance, a more complete
send
function could have the signaturesend(message, to, cc=None, bcc=None)
. Herecc
andbcc
are optional and evaluate toNone
when they are not passed another value.Usage antipattern: It is possible to follow the order of arguments in the definition without explicitly naming the arguments, like in
send("42", "Frankie", "Benjy", "Trillian")
, sending a blind carbon copy to Trillian. It is also possible to name arguments in another order, like insend("42", "Frankie", bcc="Trillian", cc="Benjy")
. Unless thereâs a strong reason not to, itâs better to use the form that is the closest to the function definition:send("42", "Frankie", cc="Benjy", bcc="Trillian")
.
Never is often better than right now
It is often harder to remove an optional argument (and its logic inside the function) that was added âjust in caseâ and is seemingly never used, than to add a new optional argument and its logic when needed.
- Arbitrary argument list
-
Defined with the
*args
construct, it denotes an extensible number of positional arguments. In the function body,args
will be a tuple of all the remaining positional arguments. For example,send(message, *args)
can also be called with each recipient as an argument:send("42", "Frankie", "Benjy", "Trillian")
; and in the function body,args
will be equal to("Frankie", "Benjy", "Trillian")
. A good example of when this works is theprint
function.Caveat: If a function receives a list of arguments of the same nature, itâs often more clear to use a list or any sequence. Here, if
send
has multiple recipients, we can define it explicitly:send(message, recipients)
and call it withsend("42", ["Benjy", "Frankie", "Trillian"])
. - Arbitrary keyword argument dictionary
-
Defined via the
**kwargs
construct, it passes an undetermined series of named arguments to the function. In the function body,kwargs
will be a dictionary of all the passed named arguments that have not been caught by other keyword arguments in the function signature. An example of when this is useful is in logging; formatters at different levels can seamlessly take what information they need without inconveniencing the user.Caveat: The same caution as in the case of
*args
is necessary, for similar reasons: these powerful techniques are to be used when there is a proven necessity to use them, and they should not be used if the simpler and clearer construct is sufficient to express the functionâs intention.
Note
The variable names *args
and **kwargs
can (and should) be replaced
with other names, when other names make more sense.
It is up to the programmer writing the function to determine which arguments are positional arguments and which are optional keyword arguments, and to decide whether to use the advanced techniques of arbitrary argument passing. After all, there should be oneâand preferably only oneâobvious way to do it. Other users will appreciate your effort when your Python functions are:
-
Easy to read (meaning the name and arguments need no explanation)
-
Easy to change (meaning adding a new keyword argument wonât break other parts of the code)
If the implementation is hard to explain, itâs a bad idea
A powerful tool for hackers, Python comes with a very rich set of hooks and tools allowing you to do almost any kind of tricky tricks. For instance, it is possible to:
-
Change how objects are created and instantiated
-
Change how the Python interpreter imports modules
-
Embed C routines in Python
All these options have drawbacks, and it is always better to use the most straightforward way to achieve your goal. The main drawback is that readability suffers when using these constructs, so whatever you gain must be more important than the loss of readability. Many code analysis tools, such as pylint or pyflakes, will be unable to parse this âmagicâ code.
A Python developer should know about these nearly infinite possibilities, because it instills confidence that no impassable problem will be on the way. However, knowing how and particularly when not to use them is very important.
Like a kung fu master, a Pythonista knows how to kill with a single finger, and never to actually do it.
We are all responsible users
As already demonstrated, Python allows many tricks, and some of them are potentially dangerous. A good example is that any client code can override an objectâs properties and methods: there is no âprivateâ keyword in Python. This philosophy is very different from highly defensive languages like Java, which provide a lot of mechanisms to prevent any misuse, and is expressed by the saying: âWe are all responsible users.â
This doesnât mean that, for example, no properties are considered private, and that proper encapsulation is impossible in Python. Rather, instead of relying on concrete walls erected by the developers between their code and othersâ code, the Python community prefers to rely on a set of conventions indicating that these elements should not be accessed directly.
The main convention for private properties and
implementation details is to prefix all âinternalsâ
with an underscore (e.g., sys._getframe
).
If the client code breaks this rule
and accesses these marked elements, any misbehavior
or problems encountered if the code is modified
are the responsibility of the client code.
Using this convention generously is encouraged: any method or property that is not intended to be used by client code should be prefixed with an underscore. This will guarantee a better separation of duties and easier modification of existing code; it will always be possible to publicize a private property, but making a public property private might be a much harder operation.
Return values from one place
When a function grows in complexity, it is not uncommon to use multiple return statements inside the functionâs body. However, to keep a clear intent and sustain readability, it is best to return meaningful values from as few points in the body as possible.
The two ways to exit from a function are upon error, or with a
return value after the function has been processed normally.
In cases when the function cannot perform correctly, it can be
appropriate to return a None
or False
value.
In this case, it is better to return from the function
as early as the incorrect context has been detected, to
flatten the structure of the function: all the code after the
return-because-of-failure statement can assume the condition
is met to further compute the functionâs main result.
Having multiple such return statements is often necessary.
Still, when possible, keep a single exit pointâitâs difficult to debug functions when you first have to identify which return statement is responsible for your result. Forcing the function to exit in just one place also helps to factor out some code paths, as the multiple exit points probably are a hint that such a refactoring is needed. This example is not bad code, but it could possibly be made more clear, as indicated in the comments:
def
select_ad
(
third_party_ads
,
user_preferences
):
if
not
third_party_ads
:
return
None
# Raising an exception might be better
if
not
user_preferences
:
return
None
# Raising an exception might be better
# Some complex code to pick the best_ad given the
# available ads and the individual's preferences...
# Resist the temptation to return best_ad if succeeded...
if
not
best_ad
:
# Some Plan-B computation of best_ad
return
best_ad
# A single exit point for the returned value
# will help when maintaining the code
Conventions
Conventions make sense to everyone, but may not be the only way to do things. The conventions we show here are the more commonly used choices, and we recommend them as the more readable option.
Alternatives to checking for equality
When you donât need to explicitly compare a value to
True
, or None
, or 0
, you can
just add it to the if
statement, like in the following examples.
(See
âTruth Value Testingâ
for a list of what is considered false).
Bad | Good |
---|---|
|
|
|
|
Accessing dictionary elements
Use the x in d
syntax instead of
the dict.has_key
method,
or pass a default argument to dict.get()
:
Bad | Good |
---|---|
|
|
Manipulating lists
List comprehensions provide a powerful, concise way to work with lists (for more information, see the entry in The Python Tutorial).
Also, the map()
and filter()
functions
can perform operations on lists using a different,
more concise syntax:
Standard loop | List comprehension |
---|---|
|
|
|
|
Use enumerate()
to keep a count of your place in the list.
It is more readable than manually creating a counter,
and it is better optimized for iterators:
>>>
a
=
[
"icky"
,
"icky"
,
"icky"
,
"p-tang"
]
>>>
for
i
,
item
in
enumerate
(
a
):
...
(
"{i}: {item}"
.
format
(
i
=
i
,
item
=
item
))
...
0
:
icky
1
:
icky
2
:
icky
3
:
p
-
tang
Continuing a long line of code
When a logical line of code is longer than the accepted limit,4 you need to split it over multiple physical lines. The Python interpreter will join consecutive lines if the last character of the line is a backslash. This is helpful in some cases, but should usually be avoided because of its fragility: a whitespace character added to the end of the line, after the backslash, will break the code and may have unexpected results.
A better solution is to use parentheses around your elements. Left with an unclosed parenthesis on an end-of-line, the Python interpreter will join the next line until the parentheses are closed. The same behavior holds for curly and square braces:
Bad | Good |
---|---|
|
|
|
|
However, more often than not, having to split a long logical line is a sign that you are trying to do too many things at the same time, which may hinder readability.
Idioms
Although there usually is oneâand preferably only oneâobvious way to do it, the way to write idiomatic (or Pythonic) code can be non-obvious to Python beginners at first (unless theyâre Dutch5). So, good idioms must be consciously acquired.
Unpacking
If you know the length of a list or tuple, you can assign names to its
elements with unpacking. For example, because itâs possible to specify
the number of times to split a string in split()
and rsplit()
,
the righthand side of an assignment can be made to split only once
(e.g., into a filename and an extension), and the lefthand side can
contain both destinations simultaneously, in the correct order, like this:
>>>
filename
,
ext
=
"my_photo.orig.png"
.
rsplit
(
"."
,
1
)
>>>
(
filename
,
"is a"
,
ext
,
"file."
)
my_photo
.
orig
is
a
png
file
.
You can use unpacking to swap variables as well:
a
,
b
=
b
,
a
Nested unpacking works, too:
a
,
(
b
,
c
)
=
1
,
(
2
,
3
)
In Python 3, a new method of extended unpacking was introduced by PEP 3132:
a
,
*
rest
=
[
1
,
2
,
3
]
# a = 1, rest = [2, 3]
a
,
*
middle
,
c
=
[
1
,
2
,
3
,
4
]
# a = 1, middle = [2, 3], c = 4
Ignoring a value
If you need to assign something while unpacking, but
will not need that variable, use a double underscore (__
):
filename
=
'foobar.txt'
basename
,
__
,
ext
=
filename
.
rpartition
(
'.'
)
Note
Many Python style guides recommend a single underscore (_
)
for throwaway variables rather than the double underscore (__
)
recommended here. The issue is that a single underscore is commonly used as an alias
for the gettext.gettext()
function, and is also used at the
interactive prompt to hold the value of the last operation. Using a
double underscore instead is just as clear and almost as convenient,
and eliminates the risk of accidentally overwriting the single underscore
variable, in either of these other use cases.
Creating a length-N list of the same thing
Use the Python list *
operator to make a list
of the same immutable item:
>>>
four_nones
=
[
None
]
*
4
>>>
(
four_nones
)
[
None
,
None
,
None
,
None
]
But be careful with mutable objects: because
lists are mutable, the *
operator
will create a list of N references to the same list,
which is not likely what you want.
Instead, use a list comprehension:
Bad | Good |
---|---|
|
|
A common idiom for creating strings is to use str.join()
on an empty string. This idiom can be applied to lists and tuples:
>>>
letters
=
[
's'
,
'p'
,
'a'
,
'm'
]
>>>
word
=
''
.
join
(
letters
)
>>>
(
word
)
spam
Sometimes we need to search through a collection of things. Letâs look at two options: lists and sets.
Take the following code for example:
>>>
x
=
list
((
'foo'
,
'foo'
,
'bar'
,
'baz'
))
>>>
y
=
set
((
'foo'
,
'foo'
,
'bar'
,
'baz'
))
>>>
>>>
(
x
)
[
'foo'
,
'foo'
,
'bar'
,
'baz'
]
>>>
(
y
)
{
'foo'
,
'bar'
,
'baz'
}
>>>
>>>
'foo'
in
x
True
>>>
'foo'
in
y
True
Even though both boolean tests for list and set membership
look identical, foo in y
is utilizing the fact that sets
(and dictionaries) in Python are hash tables,6 the lookup performance
between the two examples is different. Python will have to step through each item
in the list to find a matching case, which is time-consuming (the time
difference becomes significant for larger collections).
But finding keys in the set can be done quickly, using the hash lookup.
Also, sets and dictionaries drop duplicate entries, which is why
dictionaries cannot have two identical keys.
For more information, see this Stack Overflow
discussion on list versus dict.
Exception-safe contexts
It is common to use try/finally
clauses to manage
resources like files or thread locks when exceptions may occur.
PEPÂ 343
introduced the with
statement and a context manager protocol
into Python (in version 2.5 and beyond)âan idiom to replace these try/finally
clauses with
more readable code. The protocol consists of two
methods, __enter__()
and __exit__()
, that when
implemented for an object allow it to be used via
the new with
statement, like this:
>>>
import
threading
>>>
some_lock
=
threading
.
Lock
()
>>>
>>>
with
some_lock
:
...
# Make Earth Mark One, run it for 10 million years ...
...
(
...
"Look at me: I design coastlines.
\n
"
...
"I got an award for Norway."
...
)
...
which would previously have been:
>>>
import
threading
>>>
some_lock
=
threading
.
Lock
()
>>>
>>>
some_lock
.
acquire
()
>>>
try
:
...
# Make Earth Mark One, run it for 10 million years ...
...
(
...
"Look at me: I design coastlines.
\n
"
...
"I got an award for Norway."
...
)
...
finally
:
...
some_lock
.
release
()
The standard library module contextlib
provides additional tools that help turn functions into context managers,
enforce the call of an objectâs close()
method, suppress
exceptions (Python 3.4 and greater), and redirect standard output
and error streams (Python 3.4 or 3.5 and greater).
Here is an example use of contextlib.closing()
:
>>>
from
contextlib
import
closing
>>>
with
closing
(
open
(
"outfile.txt"
,
"w"
))
as
output
:
...
output
.
write
(
"Well, he's...he's, ah...probably pining for the fjords."
)
...
56
but because __enter__()
and __exit__()
methods are
defined for the object that handles file I/O,7
we can use the with
statement directly, without the closing
:
>>>
with
open
(
"outfile.txt"
,
"w"
)
as
output
:
output
.
write
(
"PININ' for the FJORDS?!?!?!? "
"What kind of talk is that?, look, why did he fall "
"flat on his back the moment I got 'im home?
\n
"
)
...
123
Common Gotchas
For the most part, Python aims to be a clean and consistent language that avoids surprises. However, there are a few cases that can be confusing to newcomers.
Some of these cases are intentional but can be potentially surprising. Some could arguably be considered language warts. In general, though, what follows is a collection of potentially tricky behaviors that might seem strange at first glance, but are generally sensible once youâre aware of the underlying cause for the surprise.
Mutable default arguments
Seemingly the most common surprise new Python programmers encounter is Pythonâs treatment of mutable default arguments in function definitions.
- What you wrote:
-
def
append_to
(
element
,
to
=
[]):
to
.
append
(
element
)
return
to
- What you might have expected to happen:
-
my_list
=
append_to
(
12
)
print
(
my_list
)
my_other_list
=
append_to
(
42
)
print
(
my_other_list
)
A new list is created each time the function is called if a second argument isnât provided, so that the output is:
[
12
]
[
42
]
- What actually happens:
-
[
12
]
[
12
,
42
]
A new list is created once when the function is defined, and the same list is used in each successive call: Pythonâs default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will have mutated that object for all future calls to the function as well.
- What you should do instead:
-
Create a new object each time the function is called, by using a default arg to signal that no argument was provided (
None
is often a good choice):def
append_to
(
element
,
to
=
None
):
if
to
is
None
:
to
=
[]
to
.
append
(
element
)
return
to
- When this gotcha isnât a gotcha:
-
Sometimes you can specifically âexploitâ (i.e., use as intended) this behavior to maintain state between calls of a function. This is often done when writing a caching function (which stores results in-memory), for example:
def
time_consuming_function
(
x
,
y
,
cache
=
{}):
args
=
(
x
,
y
)
if
args
in
cache
:
return
cache
[
args
]
# Otherwise this is the first time with these arguments.
# Do the time-consuming operation...
cache
[
args
]
=
result
return
result
Late binding closures
Another common source of confusion is the way Python binds its variables in closures (or in the surrounding global scope).
- What you wrote:
-
def
create_multipliers
():
return
[
lambda
x
:
i
*
x
for
i
in
range
(
5
)]
- What you might have expected to happen:
-
for
multiplier
in
create_multipliers
():
print
(
multiplier
(
2
),
end
=
" ... "
)
print
()
A list containing five functions that each have their own closed-over
i
variable that multiplies their argument, producing:0
...
2
...
4
...
6
...
8
...
- What actually happens:
-
8
...
8
...
8
...
8
...
8
...
Five functions are created; instead all of them just multiply
x
by 4. Why? Pythonâs closures are late binding. This means that the values of variables used in closures are looked up at the time the inner function is called.Here, whenever any of the returned functions are called, the value of
i
is looked up in the surrounding scope at call time. By then, the loop has completed, andi
is left with its final value of 4.Whatâs particularly nasty about this gotcha is the seemingly prevalent misinformation that this has something to do with lambda expressions in Python. Functions created with a lambda expression are in no way special, and in fact the same exact behavior is exhibited by just using an ordinary
def
:def
create_multipliers
():
multipliers
=
[]
for
i
in
range
(
5
):
def
multiplier
(
x
):
return
i
*
x
multipliers
.
append
(
multiplier
)
return
multipliers
- What you should do instead:
-
The most general solution is arguably a bit of a hack. Due to Pythonâs aforementioned behavior concerning evaluating default arguments to functions (see âMutable default argumentsâ), you can create a closure that binds immediately to its arguments by using a default argument:
def
create_multipliers
():
return
[
lambda
x
,
i
=
i
:
i
*
x
for
i
in
range
(
5
)]
Alternatively, you can use the
functools.partial()
function:from
functools
import
partial
from
operator
import
mul
def
create_multipliers
():
return
[
partial
(
mul
,
i
)
for
i
in
range
(
5
)]
- When this gotcha isnât a gotcha:
-
Sometimes you want your closures to behave this way. Late binding is good in lots of situations (e.g., in the Diamond project, âExample use of a closure (when the gotcha isnât a gotcha)â). Looping to create unique functions is unfortunately a case where it can cause hiccups.
Structuring Your Project
By structure we mean the decisions you make concerning how your project best meets its objective. The goal is to best leverage Pythonâs features to create clean, effective code. In practical terms, that means the logic and dependencies in both your code and in your file and folder structure are clear.
Which functions should go into which modules? How does data flow through the project? What features and functions can be grouped together and isolated? By answering questions like these, you can begin to plan, in a broad sense, what your finished product will look like.
The Python Cookbook has a chapter on
modules and packages
that describes in detail how __import__
statements
and packaging works. The purpose of this
section is to outline
aspects of Pythonâs module and import systems that
are central to enforcing structure in your project.
We then discuss various perspectives on how to build
code that can be extended and tested reliably.
Thanks to the way imports and modules are handled in Python, it is relatively easy to structure a Python project: there are few constraints and the model for importing modules is easy to grasp. Therefore, you are left with the pure architectural task of crafting the different parts of your project and their interactions.
Modules
Modules are one of Pythonâs main abstraction layers, and probably the most natural one. Abstraction layers allow a programmer to separate code into parts that hold related data and functionality.
For example, if one layer of a project handles interfacing
with user actions, while another handles low-level manipulation of data,
the most natural way to separate these two layers is to regroup
all interfacing functionality
in one file, and all low-level operations in another file.
This grouping places them into two separate modules.
The interface file would then import the low-level file with the
import module
or from module import attribute
statements.
As soon as you use import
statements, you also use modules.
These can be either built-in modules (such as os
and sys
),
third-party packages you have installed
in your environment (such as Requests or NumPy),
or your projectâs internal modules.
The following code shows some example import
statements
and confirms that an imported module is a Python object
with its own data type:
>>>
import
sys
# built-in module
>>>
import
matplotlib.pyplot
as
plt
# third-party module
>>>
>>>
import
mymodule
as
mod
# internal project module
>>>
>>>
(
type
(
sys
),
type
(
plt
),
type
(
mod
))
<
class
'
module
'> <class '
module
'> <class '
module
'>
To keep in line with the style guide, keep module names short and lowercase.
And be sure to avoid using special symbols like the dot (.) or
question mark (?), which would
interfere with the way Python looks for modules.
So a filename like my.spam.py8 is one you should avoid;
Python would expect to find a spam.py file in a
folder named my
, which is not the case. The
Python documentation
gives more details about using dot notation.
Importing modules
Aside from some naming restrictions, nothing special is required to use
a Python file as a module, but it helps to understand the import
mechanism.
First, the import modu
statement will look for the
definition of modu
in a file named
modu.py in the same directory as the caller if a file with
that name exists.
If it is not found, the Python interpreter will search for modu.py in
Pythonâs search path
recursively and raise
an ImportError
exception if it is not found.
The value of the search path is platform-dependent and includes
any user- or system-defined directories in the
environmentâs $PYTHONPATH
(or %PYTHONPATH%
in Windows).
It can be manipulated or inspected in a Python session:
import
sys
>>>
sys
.
path
[
''
,
'/current/absolute/path'
,
'etc'
]
# The actual list contains every path that is searched
# when you import libraries into Python, in the order
# that they'll be searched.
Once modu.py is found, the Python interpreter will execute the module in an isolated scope. Any top-level statement in modu.py will be executed, including other imports, if any exist. Function and class definitions are stored in the moduleâs dictionary.
Finally, the moduleâs variables, functions, and classes will be available to the caller through the moduleâs namespace, a central concept in programming that is particularly helpful and powerful in Python. Namespaces provide a scope containing named attributes that are visible to each other but not directly accessible outside of the namespace.
In many languages, an include file directive causes the preprocessor
to, effectively, copy the contents of the included file into the
callerâs code.
Itâs different in Python: the included code is isolated in a module
namespace. The result of the import modu
statement will
be a module object named modu
in the global namespace,
with the attributes defined in the module accessible
via dot notation:
modu.sqrt
would be the sqrt
object defined inside of modu.py,
for example.
This means you generally donât have to worry
that the included code could have unwanted effectsâfor example, overriding an existing function with the same name.
It is possible to simulate the more standard behavior by using a
special syntax of the import
statement: from modu import *
.
However, this is generally considered bad practice:
using import *
makes code harder to read, makes dependencies
less compartmentalized, and can clobber (overwrite) existing defined
objects with the new definitions inside the imported module.
Using from modu import func
is a way to import only
the attribute you want into the global namespace.
While much less harmful than from modu import *
because it
shows explicitly
what is imported in the global namespace. Its only advantage
over a simpler import modu
is that it will save you a little typing.
Table 4-1 compares the different ways to import definitions from other modules.
Very bad (confusing for a reader) |
Better (obvious which new names are in the global namespace) |
Best (immediately obvious where the attribute comes from) |
---|---|---|
|
|
|
Is |
Has |
Now |
As mentioned in âCode Styleâ, readability is one
of the main features of Python. Readable code avoids
useless boilerplate text and clutter.
But terseness and obscurity are the limits where brevity should stop.
Explicitly stating where a class or function comes from,
as in the modu.func()
idiom, greatly improves code readability
and understandability in all but the simplest single-file projects.
Packages
Python provides a very straightforward packaging system, which extends the module mechanism to a directory.
Any directory with an __init__.py file is considered a Python package. The top-level directory with an __init__.py is the root package.9 The different modules in the package are imported in a similar manner as plain modules, but with a special behavior for the __init__.py file, which is used to gather all package-wide definitions.
A file modu.py in the directory pack/ is imported with the
statement import pack.modu
. The interpreter will look for an
__init__.py file in pack
and execute all of
its top-level statements.
Then it will look for a file named pack/modu.py and execute
all of its top-level statements.
After these operations, any variable, function, or class defined in
modu.py is available in the pack.modu
namespace.
A commonly seen issue is too much code in __init__.py files. When the projectâs complexity grows, there may be subpackages and sub-subpackages in a deep directory structure. In this case, importing a single item from a sub-sub-package will require executing all __init__.py files met while traversing the tree.
It is normal, even good practice, to leave an __init__.py empty
when the packageâs modules and subpackages do not need to share any codeâthe HowDoI and Diamond projects that are used as examples
in the next section both have no code except version numbers in their
__init__.py files. The Tablib, Requests, and Flask projects
contain a top-level documentation string and import
statements that
expose the intended API for each project, and the Werkzeug project
also exposes its top-level API but does it using lazy loading (extra code
that only adds content to the namespace as it is used, which speeds up
the initial import
statement).
Lastly, a convenient syntax is available for importing deeply
nested packages: import very.deep.module as mod
.
This allows you to use mod
in place of the verbose
repetition of very.deep.module
.
Object-Oriented Programming
Python is sometimes described as an object-oriented programming language. This can be somewhat misleading and needs to be clarified.
In Python, everything is an object, and can be handled as such. This is what is meant when we say that functions are first-class objects. Functions, classes, strings, and even types are objects in Python: they all have a type, can be passed as function arguments, and may have methods and properties. In this understanding, Python is an object-oriented language.
However, unlike Java, Python does not impose object-oriented programming as the main programming paradigm. It is perfectly viable for a Python project to not be object orientedâthat is, to use no (or very few) class definitions, class inheritance, or any other mechanisms that are specific to object-oriented programming. These features are available, but not obligatory, for us Pythonistas. Moreover, as seen in âModulesâ, the way Python handles modules and namespaces gives the developer a natural way to ensure the encapsulation and separation of abstraction layersâthe most common reasons to use object orientationâwithout classes.
Proponents of functional programming (a paradigm that, in its purest form, has no assignment operator, no side effects, and basically chains functions to accomplish tasks), say that bugs and confusion occur when a function does different things depending on the external state of the systemâfor example, a global variable that indicates whether or not a person is logged in. Python, although not a purely functional language, has tools that make functional programming possible, and then we can restrict our use of custom classes to situations where we want to glue together a state and a functionality.
In some architectures, typically web applications, multiple instances
of Python processes are spawned to respond to external requests
that can happen at the same time. In this case, holding some
state into instantiated objects, which means keeping some
static information about the world, is prone to
race conditions, a term used to describe the
situation where, at some point between the initialization of
the state of an object
(usually done with the Class.__init__()
method in Python)
and the actual use of the object state through
one of its methods, the state of the world has changed.
For example, a request may load an item in memory and later mark it as added to a userâs shopping cart. If another request sells the item to another person at the same time, it may happen that the sale actually occurs after the first session loaded the item, and then we are trying to sell inventory already flagged as sold. This and other issues led to a preference for stateless functions.
Our recommendation is as follows: when working with code that relies on some persistent context or global state (like most web applications), use functions and procedures with as few implicit contexts and side effects as possible. A functionâs implicit context is made up of any of the global variables or items in the persistence layer that are accessed from within the function. Side effects are the changes that a function makes to its implicit context. If a function saves or deletes data in a global variable or in the persistence layer, it is said to have a side effect.
Custom classes in Python should be used to
carefully isolate functions with context and side effects
from functions with logic (called pure functions).
Pure functions are deterministic: given a fixed input,
the output will always be the same. This is because they do not
depend on context, and do not have side effects. The print()
function,
for example, is impure because it returns nothing but writes to standard
output as a side effect.
Here are some benefits of having pure, separate functions:
-
Pure functions are much easier to change or replace if they need to be refactored or optimized.
-
Pure functions are easier to test with unit-tests there is less need for complex context setup and data cleaning afterward.
-
Pure functions are easier to manipulate, decorate (more on decorators in a moment), and pass around.
In summary, for some architectures, pure functions are more efficient
building blocks than classes and objects because they have no context
or side effects. As an example, the I/O functions related to each of the
file formats in the Tablib library (tablib/formats/*.pyâweâll look at
Tablib in the next chapter) are pure
functions, and not part of a class, because all they do is read data
into a separate Dataset
object that persists the data, or write the
Dataset
to a file. But the Session
object in the Requests library
(also coming up in the next chapter) is a class, because it has to persist
the cookie and authentication information that may be exchanged in an HTTP session.
Note
Object orientation is useful and even necessary in many casesâfor example, when developing graphical desktop applications or games, where the things that are manipulated (windows, buttons, avatars, vehicles) have a relatively long life of their own in the computerâs memory. This is also one motive behind object-relational mapping, which maps rows in databases to objects in code, discussed further in âDatabase Librariesâ.
Decorators
Decorators were added to Python in version 2.4
and are defined and discussed in PEPÂ 318.
A decorator is a function or a class method
that wraps (or decorates) another function or method.
The decorated function or method will replace the original
function or method. Because functions are
first-class objects in Python, this can be done manually,
but using the @decorator
syntax is clearer and preferred.
Here is an example of how to use a decorator:
>>>
def
foo
():
...
(
"I am inside foo."
)
...
...
...
>>>
import
logging
>>>
logging
.
basicConfig
()
>>>
>>>
def
logged
(
func
,
*
args
,
**
kwargs
):
...
logger
=
logging
.
getLogger
()
...
def
new_func
(
*
args
,
**
kwargs
):
...
logger
.
debug
(
"calling {} with args {} and kwargs {}"
.
format
(
...
func
.
__name__
,
args
,
kwargs
))
...
return
func
(
*
args
,
**
kwargs
)
...
return
new_func
...
>>>
>>>
...
@logged
...
def
bar
():
...
(
"I am inside bar."
)
...
>>>
logging
.
getLogger
()
.
setLevel
(
logging
.
DEBUG
)
>>>
bar
()
DEBUG
:
root
:
calling
bar
with
args
()
and
kwargs
{}
I
am
inside
bar
.
>>>
foo
()
I
am
inside
foo
.
This mechanism is useful for isolating the core logic of the function or method. A good example of a task that is better handled with decoration is memoization or caching: you want to store the results of an expensive function in a table and use them directly instead of recomputing them when they have already been computed. This is clearly not part of the function logic. As of PEP 3129, starting in Python 3, decorators can also be applied to classes.
Dynamic Typing
Python is dynamically typed (as opposed to statically typed),
meaning variables do not have a fixed type.
Variables are implemented as pointers to an object,
making it possible for
the variable a
to be set to the value 42, then to the value
âthanks for all the fishâ, then to a function.
The dynamic typing used in Python is often considered to be a weakness,
because it can lead to complexities and hard-to-debug code:
if something named a
can be set to many different things, the
developer or the maintainer must track this name in the code
to make sure it has not been set to a completely unrelated object.
Table 4-2 illustrates good and
bad practice when using names.
Advice | Bad | Good |
---|---|---|
Use short functions or methods to reduce the risk of using the same name for two unrelated things. |
|
|
Use different names for related items when they have a different type. |
|
|
There is no efficiency gain when reusing names: the assignment will still create a new object. And when the complexity grows and each assignment is separated by other lines of code, including branches and loops, it becomes harder to determine a given variableâs type.
Some coding practices, like functional programming, recommend against reassigning variables. In Java, a variable can be forced to always contain the same value after assignment by using the final keyword. Python does not have a final keyword, and it would be against its philosophy. But assigning a varible only once may be a good discipline; it helps reinforce the concept of mutable versus immutable types.
Tip
Pylint will warn you if you reassign a variable to two different types.
Mutable and Immutable Types
Python has two kinds of built-in or user-defined10 types:
# Lists are mutable
my_list
=
[
1
,
2
,
3
]
my_list
[
0
]
=
4
my_list
# [4, 2, 3] <- The same list, changed.
# Integers are immutable
x
=
6
x
=
x
+
1
# The new x occupies a different location in memory.
- Mutable types
-
These allow in-place modification of the objectâs content. Examples are lists and dictionaries, which have mutating methods like
list.append()
ordict.pop()
and can be modified in place. - Immutable types
-
These types provide no method for changing their content. For instance, the variable
x
set to the integer 6 has no âincrementâ method. To computex + 1
, you have to create another integer and give it a name.
One consequence of this difference in behavior is that mutable
types cannot be used as dictionary keys, because if the value ever
changes, it will not hash to the same value, and dictionaries
use hashing11 for key storage.
The immutable equivalent of a list is the tuple,
created with parenthesesâfor example, (1, 2)
.
It cannot be changed in place
and so can be used as a dictionary key.
Using properly mutable types for objects that are intended
to be mutable (e.g., my_list = [1, 2, 3]
) and immutable
types for objects that are intended to have a fixed value
(e.g., islington_phone = ("220", "7946", "0347")
)
clarifies the intent of the code for other developers.
One peculiarity of Python that can surprise newcomers is that strings are immutable; attempting to change one will yield a type error:
>>>
s
=
"I'm not mutable"
>>>
s
[
1
:
7
]
=
" am"
Traceback
(
most
recent
call
last
):
File
"<stdin>"
,
line
1
,
in
<
module
>
TypeError
:
'str'
object
does
not
support
item
assignment
This means that when constructing a string from its parts,
it is much more efficient to accumulate the parts in a list,
because it is mutable, and then join the parts together
to make the full string.
Also, a Python list comprehension,
which is a shorthand syntax to iterate over an input to
create a list, is better and faster than constructing a list
from calls to append()
within a loop.
Table 4-3 shows different ways to create a string
from an iterable.
Bad | Good | Best |
---|---|---|
|
|
|
The main Python page has a good discussion on this kind of optimization.
Finally, if the number of elements in a concatenation
is known, pure string addition is faster (and more
straightforward) than creating a list of items just to
do a "".join()
. All of the following formatting
options to define cheese
do the same thing:12
>>>
adj
=
"Red"
>>>
noun
=
"Leicester"
>>>
>>>
cheese
=
"
%s
%s
"
%
(
adj
,
noun
)
# This style was deprecated (PEP 3101)
>>>
cheese
=
"{} {}"
.
format
(
adj
,
noun
)
# Possible since Python 3.1
>>>
cheese
=
"{0} {1}"
.
format
(
adj
,
noun
)
# Numbers can also be reused
>>>
cheese
=
"{adj} {noun}"
.
format
(
adj
=
adj
,
noun
=
noun
)
# This style is best
>>>
(
cheese
)
Red
Leicester
Vendorizing Dependencies
A package that vendorizes dependencies includes external dependencies (third-party libraries) within its source, often inside of a folder named vendor, or packages. There is a very good blog post on the subject that lists the main reasons a package owner might do this (basically, to avoid various dependency issues), and discusses alternatives.
Consensus is that in almost all cases, it is better to keep the dependency separate, as it adds unnecessary content (often megabytes of extra code) to the repository; virtual environments used in combination with setup.py (preferred, especially when your package is a library) or a requirements.txt (which, when used, will override dependencies in setup.py in the case of conflicts) can restrict dependencies to a known set of working versions.
If those options are not enough, it might be helpful to contact the owner of the dependency to maybe resolve the issue by updating their package (e.g., your library many depend on an upcoming release of their package, or may need a specific new feature added), as those changes would likely benefit the entire community. The caveat is, if you submit pull requests for big changes, you may be expected to maintain those changes when further suggestions and requests come in; for this reason, both Tablib and Requests vendorize at least some dependencies. As the community moves into complete adoption of Python 3, hopefully fewer of the most pressing issues will remain.
Testing Your Code
Testing your code is very important. People are much more likely to use a project that actually works.
Python first included doctest
and unittest
in Python 2.1, released in
2001, embracing test-driven development (TDD),
where the developer first writes tests that define the main operation
and edge cases for a function, and then writes the function to pass those
tests.
Since then, TDD has become accepted and widely adopted in business
and in open source projectsâitâs a good idea to practice
writing the testing code and the running code in parallel.
Used wisely, this method helps you precisely define
your codeâs intent and have a more modular architecture.
Tips for testing
A test is about the most massively useful code a hitchhiker can write. Weâve summarized some of our tips here.
Just one thing per test
A testing unit should focus on one tiny bit of functionality and prove it correct.
Independence is imperative
Each test unit must be fully independent: able to run
alone, and also within the test suite, regardless of the order they are
called. The implication of this rule is that each test must be loaded with
a fresh dataset and may have to do some cleanup afterward. This is
usually handled by setUp()
and tearDown()
methods.
Precision is better than parsimony
Use long and descriptive names for testing functions. This guideline
is slightly different than for running code, where short names are
often preferred. The reason is testing functions are never called explicitly.
square()
or even sqr()
is OK in running code, but in testing code, you
should have names such as test_square_of_number_2()
or
test_square_negative_number()
. These function names are displayed when
a test fails and should be as descriptive as possible.
Speed counts
Try hard to make tests that are fast. If one test needs more than a few milliseconds to run, development will be slowed down, or the tests will not be run as often as is desirable. In some cases, tests canât be fast because they need a complex data structure to work on, and this data structure must be loaded every time the test runs. Keep these heavier tests in a separate test suite that is run by some scheduled task, and run all other tests as often as needed.
RTMF (Read the manual, friend!)
Learn your tools and learn how to run a single test or a test case. Then, when developing a function inside a module, run this functionâs tests often, ideally automatically when you save the code.
Test everything when you startâand again when you finish
Always run the full test suite before a coding session, and run it again after. This will give you more confidence that you did not break anything in the rest of the code.
Version control automation hooks are fantastic
It is a good idea to implement a hook that runs all tests before pushing code to a shared repository. You can directly add hooks to your version control system, and some IDEs provide ways to do this more simply in their own environments. Here are the links to the popular systemsâ documentation, which will step you through how to do this:
Write a breaking test if you want to take a break
If you are in the middle of a development session and have to interrupt your work, it is a good idea to write a broken unit test about what you want to develop next. When coming back to work, you will have a pointer to where you were and get back on track faster.
In the face of ambiguity, debug using a test
The first step when you are debugging your code is to write a new test pinpointing the bug. While it is not always possible to do, those bug catching tests are among the most valuable pieces of code in your project.
If the test is hard to explain, good luck finding collaborators
When something goes wrong or has to be changed, if your code has a good set of tests, you or other maintainers will rely largely on the testing suite to fix the problem or modify a given behavior. Therefore, the testing code will be read as much asâor even more thanâthe running code. A unit test whose purpose is unclear is not very helpful in this case.
If the test is easy to explain, it is almost always a good idea
Another use of the testing code is as an introduction to new developers. When other people will have to work on the code base, running and reading the related testing code is often the best thing they can do. They will (or should) discover the hot spots, where most difficulties arise, and the corner cases. If they have to add some functionality, the first step should be to add a test and, by this means, ensure the new functionality is not already a working path that has not been plugged into the interface.
Above all, donât panic
Itâs open source! The whole worldâs got your back.
Testing Basics
This section lists the basics of testingâfor an idea about what options are availableâand gives a few examples taken from the Python projects we dive into next, in Chapter 5. There is an entire book on TDD in Python, and we donât want to rewrite it. Check out Test-Driven Development with Python (OâReilly) (obey the testing goat!).
unittest
unittest
is the batteries-included test module in the Python standard
library. Its API will be familiar to anyone who has used any of the
JUnit (Java)/nUnit (.NET)/CppUnit (C/C++) series of tools.
Creating test cases is accomplished by subclassing unittest.TestCase
.
In this example code, the test function is just defined as a new method
in MyTest
:
# test_example.py
import
unittest
def
fun
(
x
):
return
x
+
1
class
MyTest
(
unittest
.
TestCase
):
def
test_that_fun_adds_one
(
self
):
self
.
assertEqual
(
fun
(
3
),
4
)
class
MySecondTest
(
unittest
.
TestCase
):
def
test_that_fun_fails_when_not_adding_number
(
self
):
self
.
assertRaises
(
TypeError
,
fun
,
"multiply six by nine"
)
Note
Test methods must start with the string test
or they will not run.
Test modules (files) are expected to match the pattern test*.py
by default but can match any pattern given to the --pattern
keyword argument
on the command line.
To run all tests in that TestClass
, open a terminal shell; and in the same
directory as the file, invoke Pythonâs unittest
module
on the command line, like this:
$
python -m unittest test_example.MyTest . ---------------------------------------------------------------------- Ran1
test
in 0.000s OK
Or to run all tests in a file, name the file:
$
python -m unittest test_example . ---------------------------------------------------------------------- Ran2
tests in 0.000s OK
Mock (in unittest)
As of Python 3.3,
unittest.mock
is
available in the standard library.
It allows you to replace parts of your system under test with mock objects and
make assertions about how they have been used.
For example, you can monkey patch a method like in the
following example (a monkey patch is code that
modifies or replaces other existing code at runtime.) In this code, the existing method named
ProductionClass.method
, for the instance we create named instance
, is replaced with a
new object, MagicMock
,
which will always return the value 3
when called, and which counts the number
of method calls it receives, records the signature it was called with, and contains
assertion methods for testing purposes:
from
unittest.mock
import
MagicMock
instance
=
ProductionClass
()
instance
.
method
=
MagicMock
(
return_value
=
3
)
instance
.
method
(
3
,
4
,
5
,
key
=
'value'
)
instance
.
method
.
assert_called_with
(
3
,
4
,
5
,
key
=
'value'
)
To mock classes or objects in a module under test, use the patch
decorator.
In the following example, an external search system is replaced with a mock that
always returns the same result (as used in this example, the patch is only for the duration of the test):
import
unittest.mock
as
mock
def
mock_search
(
self
):
class
MockSearchQuerySet
(
SearchQuerySet
):
def
__iter__
(
self
):
return
iter
([
"foo"
,
"bar"
,
"baz"
])
return
MockSearchQuerySet
()
# SearchForm here refers to the imported class reference
# myapp.SearchForm, and modifies this instance, not the
# code where the SearchForm class itself is initially
# defined.
@mock.patch
(
'myapp.SearchForm.search'
,
mock_search
)
def
test_new_watchlist_activities
(
self
):
# get_search_results runs a search and iterates over the result
self
.
assertEqual
(
len
(
myapp
.
get_search_results
(
q
=
"fish"
)),
3
)
Mock has many other ways you can configure it and control its behavior.
These are detailed in the Python documentation for
unittest.mock
.
doctest
The doctest module searches for pieces of text that look like interactive Python sessions in docstrings, and then executes those sessions to verify that they work exactly as shown.
Doctests serve a different purpose than proper unit tests. They are usually less detailed and donât catch special cases or obscure regression bugs. Instead, they are useful as an expressive documentation of the main use cases of a module and its components (an example of a happy path). However, doctests should run automatically each time the full test suite runs.
Hereâs a simple doctest in a function:
def
square
(
x
):
"""Squares x.
>>> square(2)
4
>>> square(-2)
4
"""
return
x
*
x
if
__name__
==
'__main__'
:
import
doctest
doctest
.
testmod
()
When you run this module from the command line (i.e., python module.py
), the
doctests will run and complain if anything is not behaving as described in the
docstrings.
Examples
In this section, weâll take excerpts from our favorite packages to highlight good testing practice using real code. The test suites require additional libraries not included in the packages (e.g., Requests uses Flask to mock up an HTTP server) which are included in their projectsâ requirements.txt file.
For all of these examples, the expected first steps are to open a terminal shell, change directories to a place where you work on open source projects, clone the source repository, and set up a virtual environment, like this:
$
git clone https://github.com/username/projectname.git$
cd
projectname$
virtualenv -p python3 venv$
source
venv/bin/activate(
venv)
$
pip install -r requirements.txt
Example: Testing in Tablib
Tablib uses the unittest
module in Pythonâs standard library for its testing. The test suite does not come with the
package; you must clone the GitHub repository for the files. Here is
an excerpt, with important parts annotated:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Tests for Tablib."""
import
json
import
unittest
import
sys
import
os
import
tablib
from
tablib.compat
import
markup
,
unicode
,
is_py3
from
tablib.core
import
Row
class
TablibTestCase
(
unittest
.
TestCase
)
:
"""Tablib test cases."""
def
setUp
(
self
)
:
"""Create simple data set with headers."""
global
data
,
book
data
=
tablib
.
Dataset
(
)
book
=
tablib
.
Databook
(
)
#
# ... skip additional setup not used here ...
#
def
tearDown
(
self
)
:
"""Teardown."""
pass
def
test_empty_append
(
self
)
:
"""Verify append() correctly adds tuple with no headers."""
new_row
=
(
1
,
2
,
3
)
data
.
append
(
new_row
)
# Verify width/data
self
.
assertTrue
(
data
.
width
==
len
(
new_row
)
)
self
.
assertTrue
(
data
[
0
]
==
new_row
)
def
test_empty_append_with_headers
(
self
)
:
"""Verify append() correctly detects mismatch of number of headers and data. """
data
.
headers
=
[
'
first
'
,
'
second
'
]
new_row
=
(
1
,
2
,
3
,
4
)
self
.
assertRaises
(
tablib
.
InvalidDimensions
,
data
.
append
,
new_row
)
To use
unittest
, subclassunittest.TestCase
, and write test methods whose names begin withtest
. TheTestCase
provides assert methods that check for equality, truth, data type, set membership, and whether exceptions are raisedâsee the documentation for more details.TestCase.setUp()
is run before every single test method in theTestCase
.TestCase.tearDown()
is run after every single test method in theTestCase
.13All test methods must begin with
test
, or they will not be run.There can be multiple tests within a single
TestCase
, but each one should test just one thing.
If you were contributing to Tablib, the first thing youâd do after cloning it is run the test suite and confirm that nothing breaks. Like this:
(
venv)
$
### inside the top-level directory, tablib/
(
venv)
$
python -m unittest test_tablib.py .............................................................. ---------------------------------------------------------------------- Ran62
tests in 0.289s OK
As of Python 2.7, unittest
also includes its own test discovery mechanisms,
using the discover
option on the command line:
(
venv)
$
### *above* the top-level directory, tablib/
(
venv)
$
python -m unittest discover tablib/ .............................................................. ---------------------------------------------------------------------- Ran62
tests in 0.234s OK
After confirming all of the tests pass, youâd (a) find the test case related to the part youâre changing and run it often while youâre modifying the code, or (b) write a new test case for the feature youâre adding or the bug youâre tracking down and run that often while modifying the code. The following snippet is an example:
(
venv)
$
### inside the top-level directory, tablib/
(
venv)
$
python -m unittest test_tablib.TablibTestCase.test_empty_append . ---------------------------------------------------------------------- Ran1
test
in 0.001s OK
Once your code works, youâd run the entire test suite again before pushing it to the repository. Because youâre running these tests so often, it makes sense that they should be as fast as possible. There are a lot more details about using unittest in the standard library unittest documentation.
Example: Testing in Requests
Requests uses py.test
. To see it in action, open a terminal shell, change into a temporary
directory, clone Requests, install the dependencies, and run py.test
, as shown here:
$
git clone -q https://github.com/kennethreitz/requests.git$
$
virtualenv venv -q -p python3# dash -q for 'quiet'
$
source
venv/bin/activate(
venv)
$
(
venv)
$
pip install -q -r requests/requirements.txt# 'quiet' again...
(
venv)
$
cd
requests(
venv)
$
py.test=========================
test
sessionstarts
=================================
platform darwin -- Python 3.4.3, pytest-2.8.1, py-1.4.30, pluggy-0.3.1 rootdir: /tmp/requests, inifile: plugins: cov-2.1.0, httpbin-0.0.7 collected219
items tests/test_requests.py ........................................................ X............................................ tests/test_utils.py ..s....................................................=========
217
passed,1
skipped,1
xpassed in 25.75seconds
===================
Other Popular Tools
The testing tools listed here are less frequently used, but still popular enough to mention.
pytest
pytest is a no-boilerplate alternative
to Pythonâs standard unittest module,
meaning it doesnât require the scaffolding of test classes, and maybe
not even setup and teardown methods. To install it, use pip
like usual:
$
pip install pytest
Despite being a fully featured and extensible test tool, it boasts a simple syntax. Creating a test suite is as easy as writing a module with a couple of functions:
# content of test_sample.py
def
func
(
x
):
return
x
+
1
def
test_answer
():
assert
func
(
3
)
==
5
and then running the py.test
command
is far less work than would be required for the equivalent functionality with
the unittest
module:
$
py.test===========================
test
sessionstarts
============================
platform darwin -- Python 2.7.1 -- pytest-2.2.1 collecting ... collected1
items test_sample.pyF
=================================
FAILURES
=================================
_______________________________ test_answer ________________________________ def test_answer()
: > assert func(
3)
==
5 E assert4
==
5 E + where4
=
func(
3)
test_sample.py:5:AssertionError
=========================
1
failed in 0.02seconds
=========================
Nose
Nose
extends unittest
to make testing easier:
$
pip install nose
Nose provides automatic test discovery to save you the hassle of manually creating test suites. It also provides numerous plug-ins for features such as xUnit-compatible test output, coverage reporting, and test selection.
tox
tox is a tool for automating test environment management and testing against multiple interpreter configurations:
$
pip install tox
tox allows you to configure complicated multiparameter test matrices via a simple ini-style configuration file.
Options for older versions of Python
If you arenât in control of your Python version but still want to use these testing tools, here are a few options.
unittest2
unittest2 is a backport of Python 2.7âs unittest module which has an improved API and better assertions than the ones available in previous versions of Python.
If youâre using Python 2.6 or below (meaning you probably work at a
large bank or Fortune 500 company), you can install it with pip
:
$
pip install unittest2
You may want to import the module under the name unittest to make to make it easier to port code to newer versions of the module in the future:
import
unittest2
as
unittest
class
MyTest
(
unittest
.
TestCase
):
...
This way if you ever switch to a newer Python version and no longer need the unittest2 module, you can simply change the import in your test module without the need to change any other code.
Mock
If you liked âMock (in unittest)â but use a
Python version below 3.3, you can still use unittest.mock
by
importing it as a separate library:
$
pip install mock
fixture
fixture can provide tools that make it easier to set up and tear down database backends for testing. It can load mock datasets for use with SQLAlchemy, SQLObject, Google Datastore, Django ORM, and Storm. There are still new releases, but it has only been tested on Python 2.4 through Python 2.6.
Lettuce and Behave
Lettuce and Behave are packages for doing behavior-driven development (BDD) in Python. BDD is a process that sprung out of TDD (obey the testing goat!) in the early 2000s, wishing to substitute the word âtestâ in test-driven development with âbehaviorâ to overcome newbiesâ initial trouble grasping TDD. The name was first coined by Dan North in 2003 and introduced to the world along with the Java tool JBehave in a 2006 article for Better Software magazine that is reproduced in Dan Northâs blog post, âIntroducing BDD.â
BDD grew very popular after the 2011 release of The Cucumber Book (Pragmatic Bookshelf), which documents a Behave package for Ruby. This inspired Gabriel Falcoâs Lettuce, and Peter Parenteâs Behave in our community.
Behaviors are described in plain text using a syntax named Gherkin that is human-readable and machine-processable. The following tutorials may be of use:
Documentation
Readability is a primary focus for Python developers, in both project and code documentation. The best practices described in this section can save both you and others a lot of time.
Project Documentation
There is API documentation for project users, and then there is additional project documentation for those who want to contribute to to the project. This section is about the additional project documentation.
A README file at the root directory should give general information to both users and maintainers of a project. It should be raw text or written in some very easy to read markup, such as reStructured Text (recommended because right now itâs the only format that can be understood by PyPI14) or Markdown. It should contain a few lines explaining the purpose of the project or library (without assuming the user knows anything about the project), the URL of the main source for the software, and some basic credit information. This file is the main entry point for readers of the code.
An INSTALL file is less necessary with Python (but may be helpful to comply with licence requirements such as the GPL). The installation
instructions are often reduced to one command, such as pip install
module
or python setup.py install
and added to the README
file.
A LICENSE file should always be present and specify the license under which the software is made available to the public. (See âChoosing a Licenseâ for more information.)
A TODO file or a TODO section in README should list the planned development for the code.
A CHANGELOG file or section in README should compile a short overview of the changes in the code base for the latest versions.
Project Publication
Depending on the project, your documentation might include some or all of the following components:
-
An introduction should provide a very short overview of what can be done with the product, using one or two extremely simplified use cases. This is the 30-second pitch for your project.
-
A tutorial should show some primary use cases in more detail. The reader will follow a step-by-step procedure to set up a working prototype.
-
An API reference is typically generated from the code (see âDocstring Versus Block Commentsâ). It will list all publicly available interfaces, parameters, and return values.
-
Developer documentation is intended for potential contributors. This can include code conventions and the general design strategy of the project.
Sphinx
Sphinx is far and away the most popular15 Python documentation tool. Use it. It converts the reStructured Text markup language into a range of output formats, including HTML, LaTeX (for printable PDF versions), manual pages, and plain text.
There is also great, free hosting for your Sphinx documentation: Read the Docs. Use that, too. You can configure it with commit hooks to your source repository so that rebuilding your documentation will happen automatically.
Note
Sphinx is famous for its API generation, but it also works well for general project documentation. The online Hitchhikerâs Guide to Python is built with Sphinx and is hosted on Read the Docs.
reStructured Text
Sphinx uses
reStructured Text,
and nearly all Python documentation is written using it.
If the content of your long_description
argument to
setuptools.setup()
is written in reStructured Text, it will be rendered
as HTML on PyPIâother formats will just be presented as text.
Itâs like Markdown with all the optional extensions built in.
Good resources for the syntax are:
Or just start contributing to your favorite packageâs documentation and learn by reading.
Docstring Versus Block Comments
Docstrings and block comments arenât interchangeable. Both can be used for a function or class. Hereâs an example using both:
# This function slows down program execution for some reason.
def
square_and_rooter
(
x
)
:
"""Return the square root of self times self."""
.
.
.
The leading comment block is a programmerâs note.
The docstring describes the operation of the function or class and will be shown in an interactive Python session when the user types
help(square_and_rooter)
.
Docstrings placed at the beginning of a module
or at the top of an __init__.py file will also
appear in help()
.
Sphinxâs autodoc feature can also automatically
generate documentation using appropriately formatted
docstrings.
Instructions for how to do this, and how
to format your docstrings for autodoc,
are in the
Sphinx tutorial.
For further details on docstrings, see
PEPÂ 257.
Logging
The logging module has been a part of Pythonâs Standard Library since version 2.3. It is succinctly described in PEPÂ 282. The documentation is notoriously hard to read, except for the basic logging tutorial.
Logging serves two purposes:
- Diagnostic logging
-
Diagnostic logging records events related to the applicationâs operation. If a user calls in to report an error, for example, the logs can be searched for context.
- Audit logging
-
Audit logging records events for business analysis. A userâs transactions (such as a clickstream) can be extracted and combined with other user details (such as eventual purchases) for reports or to optimize a business goal.
Logging in a Library
Notes for configuring logging for a library are in the logging tutorial. Another good resource for example uses of logging is the libraries we mention in the next chapter. Because the user, not the library, should dictate what happens when a logging event occurs, one admonition bears repeating:
It is strongly advised that you do not add any handlers other than
NullHandler
to your libraryâs loggers.
The NullHandler
does what its name saysânothing.
The user will otherwise have to expressly turn off your logging if they donât want it.
Best practice when instantiating loggers in a library is to only create them
using the __name__
global variable: the logging
module creates a
hierarchy of loggers using dot notation, so using __name__
ensures
no name collisions.
Here is an example of best practice from the Requests sourceâplace this in your projectâs top-level __init__.py:
# Set default logging handler to avoid "No handler found" warnings.
import
logging
try
:
# Python 2.7+
from
logging
import
NullHandler
except
ImportError
:
class
NullHandler
(
logging
.
Handler
):
def
emit
(
self
,
record
):
pass
logging
.
getLogger
(
__name__
)
.
addHandler
(
NullHandler
())
Logging in an Application
The Twelve-Factor App, an authoritative reference for good practice in application development, contains a section on logging best practice. It emphatically advocates for treating log events as an event stream, and for sending that event stream to standard output to be handled by the application environment.
There are at least three ways to configure a logger:
 | Pros | Cons |
---|---|---|
Using an INI-formatted file |
Itâs possible to update configuration while running using the function |
You have less control (e.g., custom subclassed filters or loggers) than possible when configuring a logger in code. |
Using a dictionary or a JSON-formatted file |
In addition to updating while running, it is also possible to load from a file using the json module, in the standard library since Python 2.6. |
You have less control than when configuring a logger in code. |
Using code |
You have complete control over the configuration. |
Any modifications require a change to source code. |
Example configuration via an INI file
More details about the INI file format are in the logging configuration section of the logging tutorial. A minimal configuration file would look like this:
[loggers]
keys
=
root
[handlers]
keys
=
stream_handler
[formatters]
keys
=
formatter
[logger_root]
level
=
DEBUG
handlers
=
stream_handler
[handler_stream_handler]
class
=
StreamHandler
level
=
DEBUG
formatter
=
formatter
args
=
(sys.stderr,)
[formatter_formatter]
format
=
%(asctime)s %(name)-12s %(levelname)-8s %(message)s
The asctime
, name
, levelname
, and message
are all optional
attributes available from the logging library.
The full list of options and their definitions is available in the Python documentation.
Let us say that our logging configuration file is named logging_config.ini.
Then to set up the logger using this configuration in the code,
weâd use logging.config.fileConfig()
:
import
logging
from
logging.config
import
fileConfig
fileConfig
(
'logging_config.ini'
)
logger
=
logging
.
getLogger
()
logger
.
debug
(
'often makes a very good meal of
%s
'
,
'visiting tourists'
)
Example configuration via a dictionary
As of Python 2.7, you can use a dictionary with configuration details. PEPÂ 391 contains a list of the mandatory and optional elements in the configuration dictionary. Hereâs a minimal implementation:
import
logging
from
logging.config
import
dictConfig
logging_config
=
dict
(
version
=
1
,
formatters
=
{
'f'
:
{
'format'
:
'
%(asctime)s
%(name)-12s
%(levelname)-8s
%(message)s
'
}
},
handlers
=
{
'h'
:
{
'class'
:
'logging.StreamHandler'
,
'formatter'
:
'f'
,
'level'
:
logging
.
DEBUG
}
},
loggers
=
{
'root'
:
{
'handlers'
:
[
'h'
],
'level'
:
logging
.
DEBUG
}
}
)
dictConfig
(
logging_config
)
logger
=
logging
.
getLogger
()
logger
.
debug
(
'often makes a very good meal of
%s
'
,
'visiting tourists'
)
Example configuration directly in code
And last, here is a minimal logging configuration directly in code:
import
logging
logger
=
logging
.
getLogger
()
handler
=
logging
.
StreamHandler
()
formatter
=
logging
.
Formatter
(
'
%(asctime)s
%(name)-12s
%(levelname)-8s
%(message)s
'
)
handler
.
setFormatter
(
formatter
)
logger
.
addHandler
(
handler
)
logger
.
setLevel
(
logging
.
DEBUG
)
logger
.
debug
(
'often makes a very good meal of
%s
'
,
'visiting tourists'
)
Choosing a License
In the United States, when no license is specified with your source publication, users have no legal right to download, modify, or distribute it. Furthermore, people canât contribute to your project unless you tell them what rules to play by. You need a license.
Upstream Licenses
If you are deriving from another project, your choice may be determined by upstream licenses. For example, the Python Software Foundation (PSF) asks all contributors to Python source code to sign a contributor agreement that formally licenses their code to the PSF (retaining their own copyright) under one of two licenses.16
Because both of those licenses allow users to sublicense under different terms, the PSF is then free to distribute Python under its own license, the Python Software Foundation License. A FAQ for the PSF License goes into detail about what users can and cannot do in plain (not legal) language. It is not intended for further use beyond licensing the PSFâs distribution of Python.
Options
There are plenty of licenses available to choose from. The PSF recommends using one of the Open Source Institute (OSI)âapproved licenses. If you wish to eventually contribute your code to the PSF, the process will be much easier if you start with one of the licenses specified on the contributions page.
Note
Remember to change the placeholder text in the template
licenses to actually reflect
your information. For example, the MIT license template
contains
Copyright (c) <year> <copyright holders>
on its second line. Apache License, Version 2.0 requires no
modification.
Open source licenses tend to fall into one of two categories:17
- Permissive licenses
-
Permissive licenses, often also called Berkeley Software Distribution (BSD)âstyle licenses, focus more on the userâs freedom to do with the software as they please. Some examples:
-
The Apache licensesâversion 2.0 is the current one, modified so that people can include it without modification in any project, can include the license by reference instead of listing it in every file, and can use Apache 2.0âlicensed code with the GNU General Public License version 3.0 (GPLv3).
-
Both the BSD 2-clause and 3-clause licensesâthe three-clause license is the two-clause license plus an additional restriction on use of the issuerâs trademarks.
-
The Massachusetts Institute of Technology (MIT) licensesâboth the Expat and the X11 versions are named after popular products that use the respective licenses.
-
The Internet Software Consortium (ISC) licenseâitâs almost identical to the MIT license except for a few lines now deemed to be extraneous.
-
- Copyleft licenses
-
Copyleft licenses, or less permissive licenses, focus more on making sure that the source code itselfâincluding any changes made to itâis made available. The GPL family is the most well known of these. The current version is GPLv3.
Note
The GPLv2 license is not compatible with Apache 2.0; so code licensed with GPLv2 cannot be mixed with Apache 2.0âlicensed projects. But Apache 2.0âlicensed projects can be used in GPLv3 projects (which must subsequently all be GPLv3).
Licenses meeting the OSI criteria all allow commercial use, modification of the software, and distribution downstreamâwith different restrictions and requirements. All of the ones listed in Table 4-4 also limit the issuerâs liability and require the user to retain the original copyright and license in any downstream distribution.
License family | Restrictions | Allowances | Requirements |
---|---|---|---|
BSD |
Protects issuerâs trademark (BSD 3-clause) |
Allows a warranty (BSD 2-clause and 3-clause) |
â |
MIT (X11 or Expat), ISC |
Protects issuerâs trademark (ISC and MIT/X11) |
Allows sublicensing with a different license |
â |
Apache version 2.0 |
Protects issuerâs trademark |
Allows sublicensing, use in patents |
Must state changes made to the source |
GPL |
Prohibits sublicensing with a different license |
Allows a warranty, and (GPLv3 only) use in patents |
Must state changes to the source and include source code |
Licensing Resources
Van Lindbergâs book Intellectual Property and Open Source (OâReilly) is a great resource on the legal aspects of open source software. It will help you understand not only licenses, but also the legal aspects of other intellectual property topics like trademarks, patents, and copyrights as they relate to open source. If youâre not that concerned about legal matters and just want to choose something quickly, these sites can help:
-
GitHub offers a handy guide that summarizes and compares licenses in a few sentences.
-
TLDRLegal18 lists what can, cannot, and must be done under the terms of each license in quick bullets.
-
The OSI list of approved licenses contains the full text of all licenses that have passed their license review process for compliance with the Open Source Definition (allowing software to be freely used, modified, and shared).
1 Originally stated by Ralph Waldo Emerson in Self-Reliance, it is quoted in PEPÂ 8 to affirm that the coderâs best judgment should supercede the style guide. For example, conformity with surrounding code and existing convention is more important than consistency with PEPÂ 8.
2 Tim Peters is a longtime Python user who eventually became one of its most prolific and tenacious core developers (creating Pythonâs sorting algorithm, Timsort), and a frequent Net presence. He at one point was rumored to be a long-running Python port of the Richard Stallman AI program stallman.el. The original conspiracy theory appeared on a listserv in the late 1990s.
3 diff is a shell utility that identifies and shows lines that differ between two files.
4 A max of 80 characters according to PEPÂ 8, 100 according to many others, and for you, whatever your boss says. Ha! But honestly, anyone whoâs ever had to use a terminal to debug code while standing up next to a rack will quickly come to appreciate the 80-character limit (at which code doesnât wrap on a terminal) and in fact prefer 75â77 characters to allow for line numbering in Vi.
5 See Zen 14. Guido, our BDFL, happens to be Dutch.
6 By the way, this is why only hashable objects can be stored in sets or used as dictionary keys. To make your own Python objects hashable, define an object.__hash__(self)
member function that returns an integer. Objects that compare equal must have the same hash value. The Python documentation has more information.
7 In this case, the __exit__()
method just calls the I/O wrapperâs close()
method, to close the file descriptor. On many systems, thereâs a maximum allowable number of open file descriptors, and itâs good practice to release them when theyâre done.
8 If youâd like, you could name your module my_spam.py, but even our friend the underscore should not be seen often in module names (underscores give the impression of a variable name).
9 Thanks to PEP 420, which was implemented in Python 3.3, there is now an alternative to the root package, called the namespace package. Namespace packages must not have an __init__.py and can be dispersed across multiple directories in sys.path
. Python will gather all of the pieces together and present them together to the user as a single package.
10 Instructions to define your own types in C are provided in the Python extension documentation.
11 An example of a simple hashing algorithm is to convert the bytes of an item to an integer, and take its value modulo some number. This is how memcached distributes keys across multiple computers.
12 We should admit that even though, according to PEPÂ 3101, the percent-style formatting (%s, %d, %f) has been deprecated now for over a decade, most old hats still use it, and PEPÂ 460 just introduced this same method to format bytes
or bytearray
objects.
13 Note that unittest.TestCase.tearDown
will not be run if the code errors out. This may be a surprise if youâve used features in unittest.mock
to alter the codeâs actual behavior. In Python 3.1, the method unittest.TestCase.addCleanup()
was added; it pushes a cleanup function and its arguments to a stack that will be called one by one after unittest.TestCase.tearDown()
or else called anyway regardless of whether tearDown()
was called. For more information, see the documentation on unittest.TestCase.addCleanup()
.
14 For those interested, thereâs some discussion about adding Markdown support for the README files on PyPI.
15 Other tools that you might see are Pycco, Ronn, Epydoc (now discontinued), and MkDocs. Pretty much everyone uses Sphinx and we recommend you do, too.
16 As of this writing, they were the Academic Free License v. 2.1 or the Apache License, Version 2.0. The full description of how this works is on the PSFâs contributions page.
17 All of the licenses described here are OSI-approved, and you can learn more about them from the main OSI license page.
18 tl;dr means âToo long; didnât read,â and apparently existed as editor shorthand before popularization on the Internet.
Get The Hitchhiker's Guide to Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.