Chapter 4. Introducing Python Object Types
This chapter begins our tour of the Python language. In an informal sense, in Python we do things with stuff.1 “Things” take the form of operations like addition and concatenation, and “stuff” refers to the objects on which we perform those operations. In this part of the book, our focus is on that stuff, and the things our programs can do with it.
Somewhat more formally, in Python, data takes the form of objects—either built-in objects that Python provides, or objects we create using Python classes or external language tools such as C extension libraries. Although we’ll firm up this definition later, objects are essentially just pieces of memory, with values and sets of associated operations. As we’ll see, everything is an object in a Python script. Even simple numbers qualify, with values (e.g., 99), and supported operations (addition, subtraction, and so on).
Because objects are also the most fundamental notion in Python programming, we’ll start this chapter with a survey of Python’s built-in object types. Later chapters provide a second pass that fills in details we’ll gloss over in this survey. Here, our goal is a brief tour to introduce the basics.
The Python Conceptual Hierarchy
Before we get to the code, let’s first establish a clear picture of how this chapter fits into the overall Python picture. From a more concrete perspective, Python programs can be decomposed into modules, statements, expressions, and objects, as follows:
Expressions create and process objects.
The discussion of modules in Chapter 3 introduced the highest level of this hierarchy. This part’s chapters begin at the bottom—exploring both built-in objects and the expressions you can code to use them.
We’ll move on to study statements in the next part of the book, though we will find that they largely exist to manage the objects we’ll meet here. Moreover, by the time we reach classes in the OOP part of this book, we’ll discover that they allow us to define new object types of our own, by both using and emulating the object types we will explore here. Because of all this, built-in objects are a mandatory point of embarkation for all Python journeys.
Note
Traditional introductions to programming often stress its three pillars of sequence (“Do this, then that”), selection (“Do this if that is true”), and repetition (“Do this many times”). Python has tools in all three categories, along with some for definition—of functions and classes. These themes may help you organize your thinking early on, but they are a bit artificial and simplistic. Expressions such as comprehensions, for example, are both repetition and selection; some of these terms have other meanings in Python; and many later concepts won’t seem to fit this mold at all. In Python, the more strongly unifying principle is objects, and what we can do with them. To see why, read on.
Why Use Built-in Types?
If you’ve used lower-level languages such as C or C++, you know that much of your work centers on implementing objects—also known as data structures—to represent the components in your application’s domain. You need to lay out memory structures, manage memory allocation, implement search and access routines, and so on. These chores are about as tedious (and error-prone) as they sound, and they usually distract from your program’s real goals.
In typical Python programs, most of this grunt work goes away. Because Python provides powerful object types as an intrinsic part of the language, there’s usually no need to code object implementations before you start solving problems. In fact, unless you have a need for special processing that built-in types don’t provide, you’re almost always better off using a built-in object instead of implementing your own. Here are some reasons why:
Built-in objects make programs easy to write. For simple tasks, built-in types are often all you need to represent the structure of problem domains. Because you get powerful tools such as collections (lists) and search tables (dictionaries) for free, you can use them immediately. You can get a lot of work done with Python’s built-in object types alone.
Built-in objects are components of extensions. For more complex tasks, you may need to provide your own objects using Python classes or C language interfaces. But as you’ll see in later parts of this book, objects implemented manually are often built on top of built-in types such as lists and dictionaries. For instance, a stack data structure may be implemented as a class that manages or customizes a built-in list.
Built-in objects are often more efficient than custom data structures. Python’s built-in types employ already optimized data structure algorithms that are implemented in C for speed. Although you can write similar object types on your own, you’ll usually be hard-pressed to get the level of performance built-in object types provide.
Built-in objects are a standard part of the language. In some ways, Python borrows both from languages that rely on built-in tools (e.g., LISP) and languages that rely on the programmer to provide tool implementations or frameworks of their own (e.g., C++). Although you can implement unique object types in Python, you don’t need to do so just to get started. Moreover, because Python’s built-ins are standard, they’re always the same; proprietary frameworks, on the other hand, tend to differ from site to site.
In other words, not only do built-in object types make programming easier, but they’re also more powerful and efficient than most of what can be created from scratch. Regardless of whether you implement new object types, built-in objects form the core of every Python program.
Python’s Core Data Types
Table 4-1 previews Python’s built-in object types and some of the syntax used to code their literals—that is, the expressions that generate these objects.2 Some of these types will probably seem familiar if you’ve used other languages; for instance, numbers and strings represent numeric and textual values, respectively, and file objects provide an interface for processing real files stored on your computer.
To some readers, though, the object types in Table 4-1 may be more general and powerful than what you are accustomed to. For instance, you’ll find that lists and dictionaries alone are powerful data representation tools that obviate most of the work you do to support collections and searching in lower-level languages. In short, lists provide ordered collections of other objects, while dictionaries store objects by key; both lists and dictionaries may be nested, can grow and shrink on demand, and may contain objects of any type.
Object type | Example literals/creation |
---|---|
| |
| |
| |
| |
| |
| |
| |
Other core types | Booleans, types, |
Implementation-related types |
Also shown in Table 4-1,
program units such as functions, modules, and classes—which we’ll meet in later
parts of this book—are objects in Python too; they are created with
statements and expressions such as def
,
class
, import
, and lambda
and may be passed around scripts freely,
stored within other objects, and so on. Python also provides a set of implementation-related types
such as compiled code objects, which are generally of interest to tool
builders more than application developers; we’ll explore these in later
parts too, though in less depth due to their specialized roles.
Despite its title, Table 4-1 isn’t
really complete, because everything we process in
Python programs is a kind of object. For instance, when we perform text
pattern matching in Python, we create pattern objects, and when we perform
network scripting, we use socket objects. These other kinds of objects are
generally created by importing and using functions in library modules—for
example, in the re
and
socket
modules for patterns and
sockets—and have behavior all their own.
We usually call the other object types in Table 4-1 core data types, though, because they are effectively built into the Python language—that is, there is specific expression syntax for generating most of them. For instance, when you run the following code with characters surrounded by quotes:
>>> 'spam'
you are, technically speaking, running a literal expression that generates and returns a new string object. There is specific Python language syntax to make this object. Similarly, an expression wrapped in square brackets makes a list, one in curly braces makes a dictionary, and so on. Even though, as we’ll see, there are no type declarations in Python, the syntax of the expressions you run determines the types of objects you create and use. In fact, object-generation expressions like those in Table 4-1 are generally where types originate in the Python language.
Just as importantly, once you create an object, you bind its operation set for all time—you can perform only string operations on a string and list operations on a list. In formal terms, this means that Python is dynamically typed, a model that keeps track of types for you automatically instead of requiring declaration code, but it is also strongly typed, a constraint that means you can perform on an object only operations that are valid for its type.
We’ll study each of the object types in Table 4-1 in detail in upcoming chapters. Before digging into the details, though, let’s begin by taking a quick look at Python’s core objects in action. The rest of this chapter provides a preview of the operations we’ll explore in more depth in the chapters that follow. Don’t expect to find the full story here—the goal of this chapter is just to whet your appetite and introduce some key ideas. Still, the best way to get started is to get started, so let’s jump right into some real code.
Numbers
If you’ve done any programming or scripting in the past, some of the object types in Table 4-1 will probably seem familiar. Even if you haven’t, numbers are fairly straightforward. Python’s core objects set includes the usual suspects: integers that have no fractional part, floating-point numbers that do, and more exotic types—complex numbers with imaginary parts, decimals with fixed precision, rationals with numerator and denominator, and full-featured sets. Built-in numbers are enough to represent most numeric quantities—from your age to your bank balance—but more types are available as third-party add-ons.
Although it offers some fancier options, Python’s basic number types
are, well, basic. Numbers in Python support the normal mathematical operations. For instance, the plus sign (+
) performs
addition, a star (*
) is
used for multiplication, and two stars (**
) are used for exponentiation:
>>>123 + 222
# Integer addition
345 >>>1.5 * 4
# Floating-point multiplication
6.0 >>>2 ** 100
# 2 to the power 100, again
1267650600228229401496703205376
Notice the last result here: Python 3.X’s integer type automatically provides extra precision for large numbers like this when needed (in 2.X, a separate long integer type handles numbers too large for the normal integer type in similar ways). You can, for instance, compute 2 to the power 1,000,000 as an integer in Python, but you probably shouldn’t try to print the result—with more than 300,000 digits, you may be waiting awhile!
>>>len(str(2 ** 1000000))
# How many digits in a really BIG number?
301030
This nested-call form works from inside out—first converting the **
result’s number to a string of digits with
the built-in str
function,
and then getting the length of the resulting string with len
. The end result is the number of digits.
str
and len
work on many object types; more on both as
we move along.
On Pythons prior to 2.7 and 3.1, once you start experimenting with floating-point numbers, you’re likely to stumble across something that may look a bit odd at first glance:
>>>3.1415 * 2
# repr: as code (Pythons < 2.7 and 3.1)
6.2830000000000004 >>>print(3.1415 * 2)
# str: user-friendly
6.283
The first result isn’t a bug; it’s a display issue. It turns out
that there are two ways to print every object in Python—with full
precision (as in the first result shown here), and in a user-friendly form
(as in the second). Formally, the first form is known as an object’s
as-code repr
, and the second is its
user-friendly str
. In older Pythons,
the floating-point repr
sometimes
displays more precision than you might expect. The difference can also
matter when we step up to using classes. For now, if something looks odd,
try showing it with a print
built-in
function call statement.
Better yet, upgrade to Python 2.7 and the latest 3.X, where floating-point numbers display themselves more intelligently, usually with fewer extraneous digits—since this book is based on Pythons 2.7 and 3.3, this is the display form I’ll be showing throughout this book for floating-point numbers:
>>>3.1415 * 2
# repr: as code (Pythons >= 2.7 and 3.1)
6.283
Besides expressions, there are a handful of useful numeric modules that ship with Python—modules are just packages of additional tools that we import to use:
>>>import math
>>>math.pi
3.141592653589793 >>>math.sqrt(85)
9.219544457292887
The math
module contains more advanced numeric tools as functions, while
the random
module
performs random-number generation and random selections (here, from a
Python list coded in square brackets—an ordered
collection of other objects to be introduced later in this
chapter):
>>>import random
>>>random.random()
0.7082048489415967 >>>random.choice([1, 2, 3, 4])
1
Python also includes more exotic numeric objects—such as complex, fixed-precision, and rational numbers, as well as sets and Booleans—and the third-party open source extension domain has even more (e.g., matrixes and vectors, and extended precision numbers). We’ll defer discussion of these types until later in this chapter and book.
So far, we’ve been using Python much like a simple calculator; to do better justice to its built-in types, let’s move on to explore strings.
Strings
Strings are used to record both textual information (your name, for instance) as well as arbitrary collections of bytes (such as an image file’s contents). They are our first example of what in Python we call a sequence—a positionally ordered collection of other objects. Sequences maintain a left-to-right order among the items they contain: their items are stored and fetched by their relative positions. Strictly speaking, strings are sequences of one-character strings; other, more general sequence types include lists and tuples, covered later.
Sequence Operations
As sequences, strings support operations that assume a positional
ordering among items. For example, if we have a four-character string
coded inside quotes (usually of the single variety), we can verify its
length with the built-in len
function
and fetch its components with indexing expressions:
>>>S = 'Spam'
# Make a 4-character string, and assign it to a name
>>>len(S)
# Length
4 >>>S[0]
# The first item in S, indexing by zero-based position
'S' >>>S[1]
# The second item from the left
'p'
In Python, indexes are coded as offsets from the front, and so start from 0: the first item is at index 0, the second is at index 1, and so on.
Notice how we assign the string to a variable named S
here. We’ll go into detail on how this works
later (especially in Chapter 6),
but Python variables never need to be declared ahead of time. A variable
is created when you assign it a value, may be assigned any type of object, and
is replaced with its value when it shows up in an expression. It must
also have been previously assigned by the time you use its value. For
the purposes of this chapter, it’s enough to know that we need to assign
an object to a variable in order to save it for later use.
In Python, we can also index backward, from the end—positive indexes count from the left, and negative indexes count back from the right:
>>>S[-1]
# The last item from the end in S
'm' >>>S[-2]
# The second-to-last item from the end
'a'
Formally, a negative index is simply added to the string’s length, so the following two operations are equivalent (though the first is easier to code and less easy to get wrong):
>>>S[-1]
# The last item in S
'm' >>>S[len(S)-1]
# Negative indexing, the hard way
'm'
Notice that we can use an arbitrary expression in the square brackets, not just a hardcoded number literal—anywhere that Python expects a value, we can use a literal, a variable, or any expression we wish. Python’s syntax is completely general this way.
In addition to simple positional indexing, sequences also support a more general form of indexing known as slicing, which is a way to extract an entire section (slice) in a single step. For example:
>>>S
# A 4-character string
'Spam' >>>S[1:3]
# Slice of S from offsets 1 through 2 (not 3)
'pa'
Probably the easiest way to think of slices is that they are a way
to extract an entire column from a string in a
single step. Their general form, X[I:J]
, means “give me everything in X
from offset I
up to but not including offset J
.” The result is returned in a new object.
The second of the preceding operations, for instance, gives us all the
characters in string S
from offsets 1
through 2 (that is, 1 through 3 – 1) as a new string. The effect is to
slice or “parse out” the two characters in the middle.
In a slice, the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced. This leads to some common usage variations:
>>>S[1:]
# Everything past the first (1:len(S))
'pam' >>>S
# S itself hasn't changed
'Spam' >>>S[0:3]
# Everything but the last
'Spa' >>>S[:3]
# Same as S[0:3]
'Spa' >>>S[:-1]
# Everything but the last again, but simpler (0:-1)
'Spa' >>>S[:]
# All of S as a top-level copy (0:len(S))
'Spam'
Note in the second-to-last command how negative offsets can be used to give bounds for slices, too, and how the last operation effectively copies the entire string. As you’ll learn later, there is no reason to copy a string, but this form can be useful for sequences like lists.
Finally, as sequences, strings also support concatenation with the plus sign (joining two strings into a new string) and repetition (making a new string by repeating another):
>>>S
'Spam' >>>S + 'xyz'
# Concatenation
'Spamxyz' >>>S
# S is unchanged
'Spam' >>>S * 8
# Repetition
'SpamSpamSpamSpamSpamSpamSpamSpam'
Notice that the plus sign (+
)
means different things for different objects: addition for numbers, and
concatenation for strings. This is a general property of Python that
we’ll call polymorphism later in the book—in sum, the meaning of an operation
depends on the objects being operated on. As you’ll see when we study
dynamic typing, this polymorphism property accounts for much of the
conciseness and flexibility of Python code. Because types aren’t
constrained, a Python-coded operation can normally work on many
different types of objects automatically, as long as they support a
compatible interface (like the +
operation here). This turns out to be a huge idea in Python; you’ll
learn more about it later on our tour.
Immutability
Also notice in the prior examples that we were not changing the original string with any of the operations we ran on it. Every string operation is defined to produce a new string as its result, because strings are immutable in Python—they cannot be changed in place after they are created. In other words, you can never overwrite the values of immutable objects. For example, you can’t change a string by assigning to one of its positions, but you can always build a new one and assign it to the same name. Because Python cleans up old objects as you go (as you’ll see later), this isn’t as inefficient as it may sound:
>>>S
'Spam' >>>S[0] = 'z'
# Immutable objects cannot be changed
...error text omitted...
TypeError: 'str' object does not support item assignment >>>S = 'z' + S[1:]
# But we can run expressions to make new objects
>>>S
'zpam'
Every object in Python is classified as either immutable (unchangeable) or not. In terms of the core types, numbers, strings, and tuples are immutable; lists, dictionaries, and sets are not—they can be changed in place freely, as can most new objects you’ll code with classes. This distinction turns out to be crucial in Python work, in ways that we can’t yet fully explore. Among other things, immutability can be used to guarantee that an object remains constant throughout your program; mutable objects’ values can be changed at any time and place (and whether you expect it or not).
Strictly speaking, you can change text-based data in
place if you either expand it into a
list of individual characters and join it back
together with nothing between, or use the newer bytearray
type
available in Pythons 2.6, 3.0, and later:
>>>S = 'shrubbery'
>>>L = list(S)
# Expand to a list: [...]
>>>L
['s', 'h', 'r', 'u', 'b', 'b', 'e', 'r', 'y'] >>>L[1] = 'c'
# Change it in place
>>>''.join(L)
# Join with empty delimiter
'scrubbery' >>>B = bytearray(b'spam')
# A bytes/list hybrid (ahead)
>>>B.extend(b'eggs')
# 'b' needed in 3.X, not 2.X
>>>B
# B[i] = ord(x) works here too
bytearray(b'spameggs') >>>B.decode()
# Translate to normal string
'spameggs'
The bytearray
supports in-place
changes for text, but only for text whose characters are all at most
8-bits wide (e.g., ASCII). All other strings are still
immutable—byte
array
is a distinct hybrid of immutable bytes
strings (whose b'...'
syntax is
required in 3.X and optional in 2.X) and mutable
lists (coded and displayed in []
), and we have to learn more about both
these and Unicode text to fully grasp this code.
Type-Specific Methods
Every string operation we’ve studied so far is really a sequence operation—that is, these operations will work on other sequences in Python as well, including lists and tuples. In addition to generic sequence operations, though, strings also have operations all their own, available as methods—functions that are attached to and act upon a specific object, which are triggered with a call expression.
For example, the string find
method is
the basic substring search operation (it returns the offset of the
passed-in substring, or −1
if it is
not present), and the string replace
method performs global searches and replacements; both act on the
subject that they are attached to and called from:
>>>S = 'Spam'
>>>S.find('pa')
# Find the offset of a substring in S
1 >>>S
'Spam' >>>S.replace('pa', 'XYZ')
# Replace occurrences of a string in S with another
'SXYZm' >>>S
'Spam'
Again, despite the names of these string methods, we are not changing the original strings here, but creating new strings as the results—because strings are immutable, this is the only way this can work. String methods are the first line of text-processing tools in Python. Other methods split a string into substrings on a delimiter (handy as a simple form of parsing), perform case conversions, test the content of the string (digits, letters, and so on), and strip whitespace characters off the ends of the string:
>>>line = 'aaa,bbb,ccccc,dd'
>>>line.split(',')
# Split on a delimiter into a list of substrings
['aaa', 'bbb', 'ccccc', 'dd'] >>>S = 'spam'
>>>S.upper()
# Upper- and lowercase conversions
'SPAM' >>>S.isalpha()
# Content tests: isalpha, isdigit, etc.
True >>>line = 'aaa,bbb,ccccc,dd\n'
>>>line.rstrip()
# Remove whitespace characters on the right side
'aaa,bbb,ccccc,dd' >>>line.rstrip().split(',')
# Combine two operations
['aaa', 'bbb', 'ccccc', 'dd']
Notice the last command here—it strips before it splits because Python runs from left to right, making a temporary result along the way. Strings also support an advanced substitution operation known as formatting, available as both an expression (the original) and a string method call (new as of 2.6 and 3.0); the second of these allows you to omit relative argument value numbers as of 2.7 and 3.1:
>>>'%s, eggs, and %s' % ('spam', 'SPAM!')
# Formatting expression (all)
'spam, eggs, and SPAM!' >>>'{0}, eggs, and {1}'.format('spam', 'SPAM!')
# Formatting method (2.6+, 3.0+)
'spam, eggs, and SPAM!' >>>'{}, eggs, and {}'.format('spam', 'SPAM!')
# Numbers optional (2.7+, 3.1+)
'spam, eggs, and SPAM!'
Formatting is rich with features, which we’ll postpone discussing until later in this book, and which tend to matter most when you must generate numeric reports:
>>>'{:,.2f}'.format(296999.2567)
# Separators, decimal digits
'296,999.26' >>>'%.2f | %+05d' % (3.14159, −42)
# Digits, padding, signs
'3.14 | −0042'
One note here: although sequence operations are generic, methods
are not—although some types share some method names, string method
operations generally work only on strings, and nothing else. As a rule
of thumb, Python’s toolset is layered: generic operations that span
multiple types show up as built-in functions or expressions (e.g.,
len(X)
, X[0]
), but type-specific operations are
method calls (e.g., aString.upper()
). Finding the tools you need
among all these categories will become more natural as you use Python
more, but the next section gives a few tips you can use right
now.
Getting Help
The methods introduced in the prior section are a representative,
but small, sample of what is available for string objects. In general,
this book is not exhaustive in its look at object methods. For more
details, you can always call the built-in dir
function.
This function lists variables assigned in the caller’s scope when called
with no argument; more usefully, it returns a list of all the attributes
available for any object passed to it. Because methods are function
attributes, they will show up in this list. Assuming S
is still the string, here are its attributes
on Python 3.3 (Python 2.X varies slightly):
>>> dir(S)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__',
'__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count',
'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index',
'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower',
'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust',
'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex',
'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith',
'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
You probably won’t care about the names with double
underscores in this list until later in the book, when we
study operator overloading in classes—they represent the implementation of the string
object and are available to support customization. The __add__
method of strings, for example, is what really performs concatenation; Python maps the first of the following to
the second internally, though you shouldn’t usually use the second form
yourself (it’s less intuitive, and might even run slower):
>>>S + 'NI!'
'spamNI!' >>>S.__add__('NI!')
'spamNI!'
In general, leading and trailing double underscores is the naming pattern Python uses for implementation details. The names without the underscores in this list are the callable methods on string objects.
The dir
function simply gives
the methods’ names. To ask what they do, you can pass them to the help
function:
>>> help(S.replace)
Help on built-in function replace:
replace(...)
S.replace(old, new[, count]) -> str
Return a copy of S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
help
is one of a handful of
interfaces to a system of code that ships with Python known as PyDoc—a tool for extracting
documentation from objects. Later in the book, you’ll see that PyDoc can
also render its reports in HTML format for display on a web
browser.
You can also ask for help on an entire string
(e.g., help(S)
), but you may get more
or less help than you want to see—information about every string method
in older Pythons, and probably no help at all in newer versions because
strings are treated specially. It’s generally better to ask about a
specific method.
Both dir
and help
also accept as arguments either a real
object (like our string S)
, or the name of a data
type (like str
, list
, and dict
). The latter form returns the same list
for dir
but shows full type details
for help
, and allows you to ask about
a specific method via type name (e.g., help on str.replace
).
For more details, you can also consult Python’s standard library
reference manual or commercially published reference books, but dir
and help
are the first level of documentation in
Python.
Other Ways to Code Strings
So far, we’ve looked at the string object’s sequence operations and
type-specific methods. Python also provides a variety of ways for us to
code strings, which we’ll explore in greater depth later. For instance,
special characters can be represented as backslash escape sequences, which Python displays in \xNN
hexadecimal escape notation, unless they represent printable
characters:
>>>S = 'A\nB\tC'
# \n is end-of-line, \t is tab
>>>len(S)
# Each stands for just one character
5 >>>ord('\n')
# \n is one character coded as decimal value 10
10 >>>S = 'A\0B\0C'
# \0, a binary zero byte, does not terminate string
>>>len(S)
5 >>>S
# Non-printables are displayed as \xNN hex escapes
'A\x00B\x00C'
Python allows strings to be enclosed in single or double quote characters—they mean the same thing but allow the other type of quote to be embedded without an escape (most programmers prefer single quotes). It also allows multiline string literals enclosed in triple quotes (single or double)—when this form is used, all the lines are concatenated together, and end-of-line characters are added where line breaks appear. This is a minor syntactic convenience, but it’s useful for embedding things like multiline HTML, XML, or JSON code in a Python script, and stubbing out lines of code temporarily—just add three quotes above and below:
>>>msg = """
aaaaaaaaaaaaa
bbb'''bbbbbbbbbb""bbbbbbb'bbbb
cccccccccccccc
"""
>>>msg
'\naaaaaaaaaaaaa\nbbb\'\'\'bbbbbbbbbb""bbbbbbb\'bbbb\ncccccccccccccc\n'
Python also supports a raw string literal
that turns off the backslash escape mechanism. Such literals start with
the letter r and are useful for strings like
directory paths on Windows (e.g., r'C:\text\new'
).
Unicode Strings
Python’s strings also come with full Unicode support required for processing text in internationalized character sets. Characters in the Japanese and Russian alphabets, for example, are outside the ASCII set. Such non-ASCII text can show up in web pages, emails, GUIs, JSON, XML, or elsewhere. When it does, handling it well requires Unicode support. Python has such support built in, but the form of its Unicode support varies per Python line, and is one of their most prominent differences.
In Python 3.X, the normal str
string handles Unicode text (including
ASCII, which is just a simple kind of Unicode); a distinct bytes
string
type represents raw byte values (including media and encoded text); and
2.X Unicode literals are supported in 3.3 and later for 2.X
compatibility (they are treated the same as normal 3.X str
strings):
>>>'sp\xc4m'
# 3.X: normal str strings are Unicode text
'spÄm' >>>b'a\x01c'
# bytes strings are byte-based data
b'a\x01c' >>>u'sp\u00c4m'
# The 2.X Unicode literal works in 3.3+: just str
'spÄm'
In Python 2.X, the normal str
string handles both 8-bit character
strings (including ASCII text) and raw byte values; a distinct unicode
string
type represents Unicode text; and 3.X bytes literals are supported in
2.6 and later for 3.X compatibility (they are treated the same as normal
2.X str
strings):
>>>print u'sp\xc4m'
# 2.X: Unicode strings are a distinct type
spÄm >>>'a\x01c'
# Normal str strings contain byte-based text/data
'a\x01c' >>>b'a\x01c'
# The 3.X bytes literal works in 2.6+: just str
'a\x01c'
Formally, in both 2.X and 3.X, non-Unicode strings are sequences of 8-bit bytes that print with ASCII characters when possible, and Unicode strings are sequences of Unicode code points—identifying numbers for characters, which do not necessarily map to single bytes when encoded to files or stored in memory. In fact, the notion of bytes doesn’t apply to Unicode: some encodings include character code points too large for a byte, and even simple 7-bit ASCII text is not stored one byte per character under some encodings and memory storage schemes:
>>>'spam'
# Characters may be 1, 2, or 4 bytes in memory
'spam' >>>'spam'.encode('utf8')
# Encoded to 4 bytes in UTF-8 in files
b'spam' >>>'spam'.encode('utf16')
# But encoded to 10 bytes in UTF-16
b'\xff\xfes\x00p\x00a\x00m\x00'
Both 3.X and 2.X also support the bytearray
string type we met earlier, which is essentially a bytes
string (a str
in 2.X) that supports most of the list
object’s in-place mutable change operations.
Both 3.X and 2.X also support coding
non-ASCII characters with \x
hexadecimal and short \u
and long \U
Unicode escapes, as well as file-wide
encodings declared in program source files. Here’s our non-ASCII
character coded three ways in 3.X (add a leading “u” and say “print” to
see the same in 2.X):
>>> 'sp\xc4\u00c4\U000000c4m'
'spÄÄÄm'
What these values mean and how they are used differs between
text strings, which are the normal string in 3.X
and Unicode in 2.X, and byte strings, which are
bytes in 3.X and the normal string in 2.X. All these escapes can be used
to embed actual Unicode code-point ordinal-value integers in text
strings. By contrast, byte strings use only \x
hexadecimal escapes to embed the encoded
form of text, not its decoded code point values—encoded bytes are the
same as code points, only for some encodings and characters:
>>> '\u00A3', '\u00A3'.encode('latin1'), b'\xA3'.decode('latin1')
('£', b'\xa3', '£')
As a notable difference, Python 2.X allows its normal and Unicode strings to be mixed in expressions as long as the normal string is all ASCII; in contrast, Python 3.X has a tighter model that never allows its normal and byte strings to mix without explicit conversion:
u'x' + b'y'# Works in 2.X (where b is optional and ignored)
u'x' + 'y'# Works in 2.X: u'xy'
u'x' + b'y'# Fails in 3.3 (where u is optional and ignored)
u'x' + 'y'# Works in 3.3: 'xy'
'x' + b'y'.decode()# Works in 3.X if decode bytes to str: 'xy'
'x'.encode() + b'y'# Works in 3.X if encode str to bytes: b'xy'
Apart from these string types, Unicode processing mostly reduces to transferring text data to and from files—text is encoded to bytes when stored in a file, and decoded into characters (a.k.a. code points) when read back into memory. Once it is loaded, we usually process text as strings in decoded form only.
Because of this model, though, files are also content-specific in
3.X: text files implement named encodings and accept and return str
strings, but binary
files instead deal in bytes
strings for raw binary data. In Python 2.X, normal files’ content is
str
bytes, and a special codecs
module
handles Unicode and represents content with the unicode
type.
We’ll meet Unicode again in the files coverage later in this chapter, but save the rest of the Unicode story for later in this book. It crops up briefly in a Chapter 25 example in conjunction with currency symbols, but for the most part is postponed until this book’s advanced topics part. Unicode is crucial in some domains, but many programmers can get by with just a passing acquaintance. If your data is all ASCII text, the string and file stories are largely the same in 2.X and 3.X. And if you’re new to programming, you can safely defer most Unicode details until you’ve mastered string basics.
Pattern Matching
One point worth noting before we move on is that none of the string
object’s own methods support pattern-based text processing. Text pattern
matching is an advanced tool outside this book’s scope, but readers with
backgrounds in other scripting languages may be interested to know that
to do pattern matching in Python, we import a module called re
. This
module has analogous calls for searching, splitting, and replacement,
but because we can use patterns to specify substrings, we can be much
more general:
>>>import re
>>>match = re.match('Hello[ \t]*(.*)world', 'Hello Python world')
>>>match.group(1)
'Python '
This example searches for a substring that begins with the word “Hello,” followed by zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched group, terminated by the word “world.” If such a substring is found, portions of the substring matched by parts of the pattern enclosed in parentheses are available as groups. The following pattern, for example, picks out three groups separated by slashes or colons, and is similar to splitting by an alternatives pattern:
>>>match = re.match('[/:](.*)[/:](.*)[/:](.*)', '/usr/home:lumberjack')
>>>match.groups()
('usr', 'home', 'lumberjack') >>>re.split('[/:]', '/usr/home:lumberjack')
['', 'usr', 'home', 'lumberjack']
Pattern matching is an advanced text-processing tool by itself, but there is also support in Python for even more advanced text and language processing, including XML and HTML parsing and natural language analysis. We’ll see additional brief examples of patterns and XML parsing at the end of Chapter 37, but I’ve already said enough about strings for this tutorial, so let’s move on to the next type.
Lists
The Python list object is the most general sequence provided by the language. Lists are positionally ordered collections of arbitrarily typed objects, and they have no fixed size. They are also mutable—unlike strings, lists can be modified in place by assignment to offsets as well as a variety of list method calls. Accordingly, they provide a very flexible tool for representing arbitrary collections—lists of files in a folder, employees in a company, emails in your inbox, and so on.
Sequence Operations
Because they are sequences, lists support all the sequence operations we discussed for strings; the only difference is that the results are usually lists instead of strings. For instance, given a three-item list:
>>>L = [123, 'spam', 1.23]
# A list of three different-type objects
>>>len(L)
# Number of items in the list
3
we can index, slice, and so on, just as for strings:
>>>L[0]
# Indexing by position
123 >>>L[:-1]
# Slicing a list returns a new list
[123, 'spam'] >>>L + [4, 5, 6]
# Concat/repeat make new lists too
[123, 'spam', 1.23, 4, 5, 6] >>>L * 2
[123, 'spam', 1.23, 123, 'spam', 1.23] >>>L
# We're not changing the original list
[123, 'spam', 1.23]
Type-Specific Operations
Python’s lists may be reminiscent of arrays in other languages, but they tend to be more powerful. For one thing, they have no fixed type constraint—the list we just looked at, for example, contains three objects of completely different types (an integer, a string, and a floating-point number). Further, lists have no fixed size. That is, they can grow and shrink on demand, in response to list-specific operations:
>>>L.append('NI')
# Growing: add object at end of list
>>>L
[123, 'spam', 1.23, 'NI'] >>>L.pop(2)
# Shrinking: delete an item in the middle
1.23 >>>L
# "del L[2]" deletes from a list too
[123, 'spam', 'NI']
Here, the list append
method
expands the list’s size and inserts an item at the end; the pop
method (or
an equivalent del
statement) then
removes an item at a given offset, causing the list to shrink. Other
list methods insert an item at an arbitrary position (insert
), remove a given item by value
(remove
), add multiple items at the
end (extend
), and so on. Because
lists are mutable, most list methods also change the list object in
place, instead of creating a new one:
>>>M = ['bb', 'aa', 'cc']
>>>M.sort()
>>>M
['aa', 'bb', 'cc'] >>>M.reverse()
>>>M
['cc', 'bb', 'aa']
The list sort
method here, for example, orders the list in ascending fashion by
default, and reverse
reverses it—in
both cases, the methods modify the list directly.
Bounds Checking
Although lists have no fixed size, Python still doesn’t allow us to reference items that are not present. Indexing off the end of a list is always a mistake, but so is assigning off the end:
>>>L
[123, 'spam', 'NI'] >>>L[99]
...error text omitted...
IndexError: list index out of range >>>L[99] = 1
...error text omitted...
IndexError: list assignment index out of range
This is intentional, as it’s usually an error to try to assign off
the end of a list (and a particularly nasty one in the C language, which
doesn’t do as much error checking as Python). Rather than silently
growing the list in response, Python reports an error. To grow a list,
we call list methods such as append
instead.
Nesting
One nice feature of Python’s core data types is that they support arbitrary nesting—we can nest them in any combination, and as deeply as we like. For example, we can have a list that contains a dictionary, which contains another list, and so on. One immediate application of this feature is to represent matrixes, or “multidimensional arrays” in Python. A list with nested lists will do the job for basic applications (you’ll get “...” continuation-line prompts on lines 2 and 3 of the following in some interfaces, but not in IDLE):
>>>M = [[1, 2, 3],
# A 3 × 3 matrix, as nested lists
[4, 5, 6],
# Code can span lines if bracketed
[7, 8, 9]]
>>>M
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Here, we’ve coded a list that contains three other lists. The effect is to represent a 3 × 3 matrix of numbers. Such a structure can be accessed in a variety of ways:
>>>M[1]
# Get row 2
[4, 5, 6] >>>M[1][2]
# Get row 2, then get item 3 within the row
6
The first operation here fetches the entire second row, and the second grabs the third item within that row (it runs left to right, like the earlier string strip and split). Stringing together index operations takes us deeper and deeper into our nested-object structure.3
Comprehensions
In addition to sequence operations and list methods, Python includes a more advanced operation known as a list comprehension expression, which turns out to be a powerful way to process structures like our matrix. Suppose, for instance, that we need to extract the second column of our sample matrix. It’s easy to grab rows by simple indexing because the matrix is stored by rows, but it’s almost as easy to get a column with a list comprehension:
>>>col2 = [row[1] for row in M]
# Collect the items in column 2
>>>col2
[2, 5, 8] >>>M
# The matrix is unchanged
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
List comprehensions derive from set notation; they are a way to build a new list by
running an expression on each item in a sequence, one at a time, from
left to right. List comprehensions are coded in square brackets (to tip
you off to the fact that they make a list) and are composed of an
expression and a looping construct that share a variable name (row
, here). The preceding list comprehension
means basically what it says: “Give me row[1]
for each row in matrix M
, in a new list.” The result is a new list
containing column 2 of the matrix.
List comprehensions can be more complex in practice:
>>>[row[1] + 1 for row in M]
# Add 1 to each item in column 2
[3, 6, 9] >>>[row[1] for row in M if row[1] % 2 == 0]
# Filter out odd items
[2, 8]
The first operation here, for instance, adds 1 to each item as it
is collected, and the second uses an if
clause to filter odd numbers out of the
result using the %
modulus expression
(remainder of division). List comprehensions make new lists of results,
but they can be used to iterate over any iterable
object—a term we’ll flesh out later in this preview. Here, for instance,
we use list comprehensions to step over a hardcoded list of coordinates
and a string:
>>>diag = [M[i][i] for i in [0, 1, 2]]
# Collect a diagonal from matrix
>>>diag
[1, 5, 9] >>>doubles = [c * 2 for c in 'spam']
# Repeat characters in a string
>>>doubles
['ss', 'pp', 'aa', 'mm']
These expressions can also be used to collect
multiple values, as long as we wrap those values in a nested collection.
The following illustrates using range
—a built-in that generates successive integers, and requires a
surrounding list
to display all its
values in 3.X only (2.X makes a physical list all at once):
>>>list(range(4))
# 0..3 (list() required in 3.X
) [0, 1, 2, 3] >>>list(range(−6, 7, 2))
# −6 to +6 by 2 (need list() in 3.X
) [−6, −4, −2, 0, 2, 4, 6] >>>[[x ** 2, x ** 3] for x in range(4)]
# Multiple values, "if" filters
[[0, 0], [1, 1], [4, 8], [9, 27]] >>>[[x, x / 2, x * 2] for x in range(−6, 7, 2) if x > 0]
[[2, 1, 4], [4, 2, 8], [6, 3, 12]]
As you can probably tell, list comprehensions, and relatives like
the map
and filter
built-in functions, are too involved to cover more formally in this
preview chapter. The main point of this brief introduction is to
illustrate that Python includes both simple and advanced tools in its
arsenal. List comprehensions are an optional feature, but they tend to
be very useful in practice and often provide a substantial processing
speed advantage. They also work on any type that is a sequence in
Python, as well as some types that are not. You’ll hear much more about
them later in this book.
As a preview, though, you’ll find that in recent Pythons,
comprehension syntax has been generalized for other roles: it’s not just
for making lists today. For example, enclosing a comprehension in
parentheses can also be used to create generators that produce
results on demand. To illustrate, the sum
built-in sums items in a sequence—in this example, summing all items in
our matrix’s rows on request:
>>>G = (sum(row) for row in M)
# Create a generator of row sums
>>>next(G)
# iter(G) not required here
6 >>>next(G)
# Run the iteration protocol next()
15 >>>next(G)
24
The map
built-in can do similar
work, by generating the results of running items through a function, one
at a time and on request. Like range
,
wrapping it in list
forces it to
return all its values in Python 3.X; this isn’t needed in 2.X where
map
makes a list of results all at
once instead, and is not needed in other contexts that iterate
automatically, unless multiple scans or list-like behavior is also
required:
>>>list(map(sum, M))
# Map sum over items in M
[6, 15, 24]
In Python 2.7 and 3.X, comprehension syntax can also be used to create sets and dictionaries:
>>>{sum(row) for row in M}
# Create a set of row sums
{24, 6, 15} >>>{i : sum(M[i]) for i in range(3)}
# Creates key/value table of row sums
{0: 6, 1: 15, 2: 24}
In fact, lists, sets, dictionaries, and generators can all be built with comprehensions in 3.X and 2.7:
>>>[ord(x) for x in 'spaam']
# List of character ordinals
[115, 112, 97, 97, 109] >>>{ord(x) for x in 'spaam'}
# Sets remove duplicates
{112, 97, 115, 109} >>>{x: ord(x) for x in 'spaam'}
# Dictionary keys are unique
{'p': 112, 'a': 97, 's': 115, 'm': 109} >>>(ord(x) for x in 'spaam')
# Generator of values
<generator object <genexpr> at 0x000000000254DAB0>
To understand objects like generators, sets, and dictionaries, though, we must move ahead.
Dictionaries
Python dictionaries are something completely different (Monty Python reference intended)—they are not sequences at all, but are instead known as mappings. Mappings are also collections of other objects, but they store objects by key instead of by relative position. In fact, mappings don’t maintain any reliable left-to-right order; they simply map keys to associated values. Dictionaries, the only mapping type in Python’s core objects set, are also mutable: like lists, they may be changed in place and can grow and shrink on demand. Also like lists, they are a flexible tool for representing collections, but their more mnemonic keys are better suited when a collection’s items are named or labeled—fields of a database record, for example.
Mapping Operations
When written as literals, dictionaries are coded in curly braces and consist of a series of “key: value” pairs. Dictionaries are useful anytime we need to associate a set of values with keys—to describe the properties of something, for instance. As an example, consider the following three-item dictionary (with keys “food,” “quantity,” and “color,” perhaps the details of a hypothetical menu item?):
>>> D = {'food': 'Spam', 'quantity': 4, 'color': 'pink'}
We can index this dictionary by key to fetch and change the keys’ associated values. The dictionary index operation uses the same syntax as that used for sequences, but the item in the square brackets is a key, not a relative position:
>>>D['food']
# Fetch value of key 'food'
'Spam' >>>D['quantity'] += 1
# Add 1 to 'quantity' value
>>>D
{'color': 'pink', 'food': 'Spam', 'quantity': 5}
Although the curly-braces literal form does see use, it is perhaps more common to see dictionaries built up in different ways (it’s rare to know all your program’s data before your program runs). The following code, for example, starts with an empty dictionary and fills it out one key at a time. Unlike out-of-bounds assignments in lists, which are forbidden, assignments to new dictionary keys create those keys:
>>>D = {}
>>>D['name'] = 'Bob'
# Create keys by assignment
>>>D['job'] = 'dev'
>>>D['age'] = 40
>>>D
{'age': 40, 'job': 'dev', 'name': 'Bob'} >>>print(D['name'])
Bob
Here, we’re effectively using dictionary keys as field names in a record that describes someone. In other applications, dictionaries can also be used to replace searching operations—indexing a dictionary by key is often the fastest way to code a search in Python.
As we’ll learn later, we can also make dictionaries by passing to
the dict
type name either keyword arguments (a special
name
=
value
syntax in
function calls), or the result of
zipping together sequences of keys and values
obtained at runtime (e.g., from files). Both the following make the same
dictionary as the prior example and its equivalent {}
literal form, though the first tends to
make for less typing:
>>>bob1 = dict(name='Bob', job='dev', age=40)
# Keywords
>>>bob1
{'age': 40, 'name': 'Bob', 'job': 'dev'} >>>bob2 = dict(zip(['name', 'job', 'age'], ['Bob', 'dev', 40]))
# Zipping
>>>bob2
{'job': 'dev', 'name': 'Bob', 'age': 40}
Notice how the left-to-right order of dictionary keys is scrambled. Mappings are not positionally ordered, so unless you’re lucky, they’ll come back in a different order than you typed them. The exact order may vary per Python, but you shouldn’t depend on it, and shouldn’t expect yours to match that in this book.
Nesting Revisited
In the prior example, we used a dictionary to describe a hypothetical person, with three keys. Suppose, though, that the information is more complex. Perhaps we need to record a first name and a last name, along with multiple job titles. This leads to another application of Python’s object nesting in action. The following dictionary, coded all at once as a literal, captures more structured information:
>>>rec = {'name': {'first': 'Bob', 'last': 'Smith'},
'jobs': ['dev', 'mgr'],
'age': 40.5}
Here, we again have a three-key dictionary at the top (keys “name,” “jobs,” and “age”), but the values have become more complex: a nested dictionary for the name to support multiple parts, and a nested list for the jobs to support multiple roles and future expansion. We can access the components of this structure much as we did for our list-based matrix earlier, but this time most indexes are dictionary keys, not list offsets:
>>>rec['name']
# 'name' is a nested dictionary
{'last': 'Smith', 'first': 'Bob'} >>>rec['name']['last']
# Index the nested dictionary
'Smith' >>>rec['jobs']
# 'jobs' is a nested list
['dev', 'mgr'] >>>rec['jobs'][-1]
# Index the nested list
'mgr' >>>rec['jobs'].append('janitor')
# Expand Bob's job description in place
>>>rec
{'age': 40.5, 'jobs': ['dev', 'mgr', 'janitor'], 'name': {'last': 'Smith', 'first': 'Bob'}}
Notice how the last operation here expands the nested jobs list—because the jobs list is a separate piece of memory from the dictionary that contains it, it can grow and shrink freely (object memory layout will be discussed further later in this book).
The real reason for showing you this example is to demonstrate the flexibility of Python’s core data types. As you can see, nesting allows us to build up complex information structures directly and easily. Building a similar structure in a low-level language like C would be tedious and require much more code: we would have to lay out and declare structures and arrays, fill out values, link everything together, and so on. In Python, this is all automatic—running the expression creates the entire nested object structure for us. In fact, this is one of the main benefits of scripting languages like Python.
Just as importantly, in a lower-level language we would have to be careful to clean up all of the object’s space when we no longer need it. In Python, when we lose the last reference to the object—by assigning its variable to something else, for example—all of the memory space occupied by that object’s structure is automatically cleaned up for us:
>>>rec = 0
# Now the object's space is reclaimed
Technically speaking, Python has a feature known as garbage collection that cleans up unused memory as your program runs and frees you from having to manage such details in your code. In standard Python (a.k.a. CPython), the space is reclaimed immediately, as soon as the last reference to an object is removed. We’ll study how this works later in Chapter 6; for now, it’s enough to know that you can use objects freely, without worrying about creating their space or cleaning up as you go.
Also watch for a record structure similar to the one we just coded in Chapter 8, Chapter 9, and Chapter 27, where we’ll use it to compare and contrast lists, dictionaries, tuples, named tuples, and classes—an array of data structure options with tradeoffs we’ll cover in full later.4
Missing Keys: if Tests
As mappings, dictionaries support accessing items by key only, with the sorts of operations we’ve just seen. In addition, though, they also support type-specific operations with method calls that are useful in a variety of common use cases. For example, although we can assign to a new key to expand a dictionary, fetching a nonexistent key is still a mistake:
>>>D = {'a': 1, 'b': 2, 'c': 3}
>>>D
{'a': 1, 'c': 3, 'b': 2} >>>D['e'] = 99
# Assigning new keys grows dictionaries
>>>D
{'a': 1, 'c': 3, 'b': 2, 'e': 99} >>>D['f']
# Referencing a nonexistent key is an error
...error text omitted...
KeyError: 'f'
This is what we want—it’s usually a programming error to fetch
something that isn’t really there. But in some generic programs, we
can’t always know what keys will be present when we write our code. How
do we handle such cases and avoid errors? One solution is to test ahead
of time. The dictionary in
membership
expression allows us to query the existence of a key and
branch on the result with a Python if
statement. In the following, be sure to press Enter twice to run the
if
interactively after typing its
code (as explained in Chapter 3, an empty
line means “go” at the interactive prompt), and just as for the earlier
multiline dictionaries and lists, the prompt changes to “...” on some
interfaces for lines two and beyond:
>>>'f' in D
False >>>if not 'f' in D:
# Python's sole selection statement
print('missing')
missing
This book has more to say about the if
statement in later chapters, but the form
we’re using here is straightforward: it consists of the word if
, followed by an expression that is
interpreted as a true or false result, followed by a block of code to
run if the test is true. In its full form, the if
statement can also have an else
clause for a default case, and one or
more elif
(“else if”)
clauses for other tests. It’s the main selection
statement tool in Python; along with both its ternary if
/else
expression cousin (which we’ll meet in a moment) and the if
comprehension filter lookalike we saw
earlier, it’s the way we code the logic of choices and decisions in our
scripts.
If you’ve used some other programming languages in the past, you
might be wondering how Python knows when the if
statement ends. I’ll explain Python’s
syntax rules in depth in later chapters, but in short, if you have more
than one action to run in a statement block, you simply indent all their
statements the same way—this both promotes readable code and reduces the
number of characters you have to type:
>>>if not 'f' in D:
print('missing')
print('no, really...')
# Statement blocks are indented
missing no, really...
Besides the in
test, there are
a variety of ways to avoid accessing nonexistent keys in the
dictionaries we create: the get
method, a conditional index with a default; the Python 2.X has_key
method, an in
work-alike that is no
longer available in 3.X; the try
statement, a tool we’ll first meet in Chapter 10 that catches and recovers
from exceptions altogether; and the if
/else
ternary (three-part) expression, which is essentially an if
statement squeezed onto a single line. Here
are a few examples:
>>>value = D.get('x', 0)
# Index but with a default
>>>value
0 >>>value = D['x'] if 'x' in D else 0
# if/else expression form
>>>value
0
We’ll save the details on such alternatives until a later chapter. For now, let’s turn to another dictionary method’s role in a common use case.
Sorting Keys: for Loops
As mentioned earlier, because dictionaries are not sequences, they don’t maintain any dependable left-to-right order. If we make a dictionary and print it back, its keys may come back in a different order than that in which we typed them, and may vary per Python version and other variables:
>>>D = {'a': 1, 'b': 2, 'c': 3}
>>>D
{'a': 1, 'c': 3, 'b': 2}
What do we do, though, if we do need to impose an ordering on a
dictionary’s items? One common solution is to grab a list of keys with
the dictionary keys
method,
sort that with the list sort
method, and then step through the result
with a Python for
loop (as for
if
, be sure to press the Enter key
twice after coding the following for
loop, and omit the outer parenthesis in the print
in Python 2.X):
>>>Ks = list(D.keys())
# Unordered keys list
>>>Ks
# A list in 2.X, "view" in 3.X: use list()
['a', 'c', 'b'] >>>Ks.sort()
# Sorted keys list
>>>Ks
['a', 'b', 'c'] >>>for key in Ks:
# Iterate though sorted keys
print(key, '=>', D[key])
# <== press Enter twice here (3.X print)
a => 1 b => 2 c => 3
This is a three-step process, although, as we’ll see in later
chapters, in recent versions of Python it can be done in one step with
the newer sorted
built-in
function. The sorted
call returns the
result and sorts a variety of object types, in this case sorting
dictionary keys automatically:
>>>D
{'a': 1, 'c': 3, 'b': 2} >>>for key in sorted(D):
print(key, '=>', D[key])
a => 1 b => 2 c => 3
Besides showcasing dictionaries, this use case serves to introduce
the Python for
loop. The for
loop is a simple and efficient way to step
through all the items in a sequence and run a block of code for each
item in turn. A user-defined loop variable (key
, here) is used to reference the current
item each time through. The net effect in our example is to print the
unordered dictionary’s keys and values, in sorted-key order.
The for
loop, and its more
general colleague the while
loop, are the
main ways we code repetitive tasks as statements in
our scripts. Really, though, the for
loop, like its relative the list comprehension introduced earlier, is a
sequence operation. It works on any object that is a sequence and, like
the list comprehension, even on some things that are not. Here, for
example, it is stepping across the characters in a string, printing the
uppercase version of each as it goes:
>>>for c in 'spam':
print(c.upper())
S P A M
Python’s while
loop is a more
general sort of looping tool; it’s not limited to stepping across
sequences, but generally requires more code to do so:
>>>x = 4
>>>while x > 0:
print('spam!' * x)
x -= 1
spam!spam!spam!spam! spam!spam!spam! spam!spam! spam!
We’ll discuss looping statements, syntax, and tools in depth later
in the book. First, though, I need to confess that this section has not
been as forthcoming as it might have been. Really, the for
loop, and all its cohorts that step
through objects from left to right, are not just
sequence operations, they are
iterable operations—as the next section describes.
Iteration and Optimization
If the last section’s for
loop
looks like the list comprehension expression introduced
earlier, it should: both are really general iteration tools. In fact,
both will work on any iterable object that follows the iteration
protocol—pervasive ideas in Python that underlie all its iteration
tools.
In a nutshell, an object is iterable if it is
either a physically stored sequence in memory, or an object that
generates one item at a time in the context of an iteration operation—a
sort of “virtual” sequence. More formally, both types of objects are
considered iterable because they support the iteration
protocol—they respond to the iter
call with an object that advances in response to next
calls
and raises an exception when finished producing values.
The generator comprehension expression we saw earlier is such an object: its values
aren’t stored in memory all at once, but are produced as requested,
usually by iteration tools. Python file objects
similarly iterate line by line when used by an iteration tool: file
content isn’t in a list, it’s fetched on demand. Both are iterable
objects in Python—a category that expands in 3.X to include core tools
like range
and map
. By deferring results as needed, these
tools can both save memory and minimize delays.
I’ll have more to say about the iteration protocol later in this
book. For now, keep in mind that every Python tool that scans an object
from left to right uses the iteration protocol. This is why the sorted
call used in the prior section works on
the dictionary directly—we don’t have to call the keys
method to get a sequence because
dictionaries are iterable objects, with a next
that returns successive keys.
It may also help you to see that any list comprehension expression, such as this one, which computes the squares of a list of numbers:
>>>squares = [x ** 2 for x in [1, 2, 3, 4, 5]]
>>>squares
[1, 4, 9, 16, 25]
can always be coded as an equivalent for
loop that builds the result list manually
by appending as it goes:
>>>squares = []
>>>for x in [1, 2, 3, 4, 5]:
# This is what a list comprehension does
squares.append(x ** 2)
# Both run the iteration protocol internally
>>>squares
[1, 4, 9, 16, 25]
Both tools leverage the iteration protocol internally and produce
the same result. The list comprehension, though, and related functional
programming tools like map
and
filter
, will often run faster than a
for
loop today on some types of code
(perhaps even twice as fast)—a property that could matter in your
programs for large data sets. Having said that, though, I should point
out that performance measures are tricky business in Python because it
optimizes so much, and they may vary from release to release.
A major rule of thumb in Python is to code for simplicity and
readability first and worry about performance later, after your program
is working, and after you’ve proved that there is a genuine performance
concern. More often than not, your code will be quick enough as it is.
If you do need to tweak code for performance, though, Python includes
tools to help you out, including the time
and
timeit
modules for timing the speed
of alternatives, and the profile
module for
isolating bottlenecks.
You’ll find more on these later in this book (see especially Chapter 21’s benchmarking case study) and in the Python manuals. For the sake of this preview, let’s move ahead to the next core data type.
Tuples
The tuple object (pronounced “toople” or “tuhple,” depending on whom you ask) is roughly like a list that cannot be changed—tuples are sequences, like lists, but they are immutable, like strings. Functionally, they’re used to represent fixed collections of items: the components of a specific calendar date, for instance. Syntactically, they are normally coded in parentheses instead of square brackets, and they support arbitrary types, arbitrary nesting, and the usual sequence operations:
>>>T = (1, 2, 3, 4)
# A 4-item tuple
>>>len(T)
# Length
4 >>T + (5, 6)
# Concatenation
(1, 2, 3, 4, 5, 6) >>>T[0]
# Indexing, slicing, and more
1
Tuples also have type-specific callable methods as of Python 2.6 and 3.0, but not nearly as many as lists:
>>>T.index(4)
# Tuple methods: 4 appears at offset 3
3 >>>T.count(4)
# 4 appears once
1
The primary distinction for tuples is that they cannot be changed once created. That is, they are immutable sequences (one-item tuples like the one here require a trailing comma):
>>>T[0] = 2
# Tuples are immutable
...error text omitted...
TypeError: 'tuple' object does not support item assignment >>>T = (2,) + T[1:]
# Make a new tuple for a new value
>>>T
(2, 2, 3, 4)
Like lists and dictionaries, tuples support mixed types and nesting, but they don’t grow and shrink because they are immutable (the parentheses enclosing a tuple’s items can often be omitted, as done here; in contexts where commas don’t otherwise matter, the commas are what actually builds a tuple):
>>>T = 'spam', 3.0, [11, 22, 33]
>>>T[1]
3.0 >>>T[2][1]
22 >>>T.append(4)
AttributeError: 'tuple' object has no attribute 'append'
Why Tuples?
So, why have a type that is like a list, but supports fewer operations? Frankly, tuples are not generally used as often as lists in practice, but their immutability is the whole point. If you pass a collection of objects around your program as a list, it can be changed anywhere; if you use a tuple, it cannot. That is, tuples provide a sort of integrity constraint that is convenient in programs larger than those we’ll write here. We’ll talk more about tuples later in the book, including an extension that builds upon them called named tuples. For now, though, let’s jump ahead to our last major core type: the file.
Files
File objects are Python code’s main interface to external files on your
computer. They can be used to read and write text memos, audio clips,
Excel documents, saved email messages, and whatever else you happen to
have stored on your machine. Files are a core type, but they’re something
of an oddball—there is no specific literal syntax for creating them.
Rather, to create a file object, you call the built-in open
function,
passing in an external filename and an optional processing mode as
strings.
For example, to create a text output file, you would pass in its name and the 'w'
processing mode
string to write data:
>>>f = open('data.txt', 'w')
# Make a new file in output mode ('w' is write)
>>>f.write('Hello\n')
# Write strings of characters to it
6
>>>f.write('world\n')
# Return number of items written in Python 3.X
6 >>>f.close()
# Close to flush output buffers to disk
This creates a file in the current directory and writes text to it
(the filename can be a full directory path if you need to access a file
elsewhere on your computer). To read back what you just wrote, reopen the
file in 'r'
processing mode,
for reading text input—this is the default if you omit the mode in the
call. Then read the file’s content into a string, and display it. A file’s
contents are always a string in your script, regardless of the type of
data the file contains:
>>>f = open('data.txt')
# 'r' (read) is the default processing mode
>>>text = f.read()
# Read entire file into a string
>>>text
'Hello\nworld\n' >>>print(text)
# print interprets control characters
Hello world >>>text.split()
# File content is always a string
['Hello', 'world']
Other file object methods support additional features we don’t have
time to cover here. For instance, file objects provide more ways of
reading and writing (read
accepts an
optional maximum byte/character size, readline
reads one line at a time, and so on),
as well as other tools (seek
moves to a
new file position). As we’ll see later, though, the best way to read a
file today is to not read it at all—files provide an
iterator that automatically reads line by line in
for
loops and other contexts:
>>> for line in open('data.txt'): print(line)
We’ll meet the full set of file methods later in this book, but if
you want a quick preview now, run a dir
call on any open file and a help
on any
of the method names that come back:
>>>dir(f)
[...many names omitted...
'buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines'] >>>help(f.seek)
...try it and see...
Binary Bytes Files
The prior section’s examples illustrate file basics that suffice for many roles. Technically, though, they rely on either the platform’s Unicode encoding default in Python 3.X, or the 8-bit byte nature of files in Python 2.X. Text files always encode strings in 3.X, and blindly write string content in 2.X. This is irrelevant for the simple ASCII data used previously, which maps to and from file bytes unchanged. But for richer types of data, file interfaces can vary depending on both content and the Python line you use.
As hinted when we met strings earlier, Python 3.X draws a sharp
distinction between text and binary data in files: text
files represent content as normal str
strings and perform Unicode encoding and decoding automatically when writing
and reading data, while binary files represent
content as a special bytes
string and
allow you to access file content unaltered. Python 2.X supports the same
dichotomy, but doesn’t impose it as rigidly, and its tools
differ.
For example, binary files are useful for
processing media, accessing data created by C programs, and so on. To
illustrate, Python’s struct
module
can both create and unpack packed binary data—raw
bytes that record values that are not Python objects—to be written to a
file in binary mode. We’ll study this technique in detail later in the
book, but the concept is simple: the following creates a binary file in
Python 3.X (binary files work the same in 2.X, but the “b” string
literal prefix isn’t required and won’t be displayed):
>>>import struct
>>>packed = struct.pack('>i4sh', 7, b'spam', 8)
# Create packed binary data
>>>packed
# 10 bytes, not objects or text
b'\x00\x00\x00\x07spam\x00\x08' >>> >>>file = open('data.bin', 'wb')
# Open binary output file
>>>file.write(packed)
# Write packed binary data
10 >>>file.close()
Reading binary data back is essentially symmetric; not all programs need to tread so deeply into the low-level realm of bytes, but binary files make this easy in Python:
>>>data = open('data.bin', 'rb').read()
# Open/read binary data file
>>>data
# 10 bytes, unaltered
b'\x00\x00\x00\x07spam\x00\x08' >>>data[4:8]
# Slice bytes in the middle
b'spam' >>>list(data)
# A sequence of 8-bit bytes
[0, 0, 0, 7, 115, 112, 97, 109, 0, 8] >>>struct.unpack('>i4sh', data)
# Unpack into objects again
(7, b'spam', 8)
Unicode Text Files
Text files are used to process all sorts of text-based data, from memos to email content to JSON and XML documents. In today’s broader interconnected world, though, we can’t really talk about text without also asking “what kind?”—you must also know the text’s Unicode encoding type if either it differs from your platform’s default, or you can’t rely on that default for data portability reasons.
Luckily, this is easier than it may sound. To access files containing non-ASCII Unicode text of the sort introduced earlier in this chapter, we simply pass in an encoding name if the text in the file doesn’t match the default encoding for our platform. In this mode, Python text files automatically encode on writes and decode on reads per the encoding scheme name you provide. In Python 3.X:
>>>S =
'sp\xc4m'
# Non-ASCII Unicode text
>>>S
'spÄm' >>>S[2]
# Sequence of characters
'Ä' >>>file = open('unidata.txt', 'w', encoding='utf-8')
# Write/encode UTF-8 text
>>>file.write(S)
# 4 characters written
4 >>>file.close()
>>>text = open('unidata.txt', encoding='utf-8').read()
# Read/decode UTF-8 text
>>>text
'spÄm' >>>len(text)
# 4 chars (code points)
4
This automatic encoding and decoding is what you normally want. Because files handle this on transfers, you may process text in memory as a simple string of characters without concern for its Unicode-encoded origins. If needed, though, you can also see what’s truly stored in your file by stepping into binary mode:
>>>raw = open('unidata.txt', 'rb').read()
# Read raw encoded bytes
>>>raw
b'sp\xc3\x84m' >>>len(raw)
# Really 5 bytes in UTF-8
5
You can also encode and decode manually if you get Unicode data from a source other than a file—parsed from an email message or fetched over a network connection, for example:
>>>text.encode('utf-8')
# Manual encode to bytes
b'sp\xc3\x84m' >>>raw.decode('utf-8')
# Manual decode to str
'spÄm'
This is also useful to see how text files would automatically encode the same string differently under different encoding names, and provides a way to translate data to different encodings—it’s different bytes in files, but decodes to the same string in memory if you provide the proper encoding name:
>>>text.encode('latin-1')
# Bytes differ in others
b'sp\xc4m' >>>text.encode('utf-16')
b'\xff\xfes\x00p\x00\xc4\x00m\x00' >>>len(text.encode('latin-1')), len(text.encode('utf-16'))
(4, 10) >>>b'\xff\xfes\x00p\x00\xc4\x00m\x00'.decode('utf-16')
# But same string decoded
'spÄm'
This all works more or less the same in Python
2.X, but Unicode strings are coded and display with a leading
“u,” byte strings don’t require or show a leading “b,” and Unicode text
files must be opened with codecs.open
,
which accepts an encoding name just like 3.X’s open
, and uses the special unicode
string to represent content in memory.
Binary file mode may seem optional in 2.X since normal files are just
byte-based data, but it’s required to avoid changing line ends if
present (more on this later in the book):
>>>import codecs
>>>codecs.open('unidata.txt', encoding='utf8').read()
# 2.X: read/decode text
u'sp\xc4m' >>>open('unidata.txt', 'rb').read()
# 2.X: read raw bytes
'sp\xc3\x84m' >>>open('unidata.txt').read()
# 2.X: raw/undecoded too
'sp\xc3\x84m'
Although you won’t generally need to care about this distinction if you deal only with ASCII text, Python’s strings and files are an asset if you deal with either binary data (which includes most types of media) or text in internationalized character sets (which includes most content on the Web and Internet at large today). Python also supports non-ASCII file names (not just content), but it’s largely automatic; tools such as walkers and listers offer more control when needed, though we’ll defer further details until Chapter 37.
Other File-Like Tools
The open
function is the workhorse for most file processing you will do in
Python. For more advanced tasks, though, Python comes with additional
file-like tools: pipes, FIFOs, sockets, keyed-access files, persistent
object shelves, descriptor-based files, relational and object-oriented
database interfaces, and more. Descriptor files, for instance, support
file locking and other low-level tools, and sockets provide an interface
for networking and interprocess communication. We won’t cover many of
these topics in this book, but you’ll find them useful once you start
programming Python in earnest.
Other Core Types
Beyond the core types we’ve seen so far, there are others that may
or may not qualify for membership in the category, depending on how
broadly it is defined. Sets, for example,
are a recent addition to the language that are neither mappings
nor sequences; rather, they are unordered collections of unique and
immutable objects. You create sets by calling the built-in set
function or
using new set literals and expressions in 3.X and 2.7, and they support
the usual mathematical set operations (the choice of new {...}
syntax for set literals makes sense, since
sets are much like the keys of a valueless dictionary):
>>>X = set('spam')
# Make a set out of a sequence in 2.X and 3.X
>>>Y = {'h', 'a', 'm'}
# Make a set with set literals in 3.X and 2.7
>>>X, Y
# A tuple of two sets without parentheses
({'m', 'a', 'p', 's'}, {'m', 'a', 'h'}) >>>X & Y
# Intersection
{'m', 'a'} >>>X | Y
# Union
{'m', 'h', 'a', 'p', 's'} >>>X - Y
# Difference
{'p', 's'} >>>X > Y
# Superset
False >>>{n ** 2 for n in [1, 2, 3, 4]}
# Set comprehensions in 3.X and 2.7
{16, 1, 4, 9}
Even less mathematically inclined programmers often find sets useful for common tasks such as filtering out duplicates, isolating differences, and performing order-neutral equality tests without sorting—in lists, strings, and all other iterable objects:
>>>list(set([1, 2, 1, 3, 1]))
# Filtering out duplicates (possibly reordered)
[1, 2, 3] >>>set('spam') - set('ham')
# Finding differences in collections
{'p', 's'} >>>set('spam') == set('asmp')
# Order-neutral equality ('spam'=='asmp' False)
True
Sets also support in
membership
tests, though all other collection types in Python do too:
>>> 'p' in set('spam'), 'p' in 'spam', 'ham' in ['eggs', 'spam', 'ham']
(True, True, True)
In addition, Python recently grew a few new numeric types: decimal numbers, which are fixed-precision floating-point numbers, and fraction numbers, which are rational numbers with both a numerator and a denominator. Both can be used to work around the limitations and inherent inaccuracies of floating-point math:
>>>1 / 3
# Floating-point (add a .0 in Python 2.X)
0.3333333333333333 >>>(2/3) + (1/2)
1.1666666666666665 >>>import decimal
# Decimals: fixed precision
>>>d = decimal.Decimal('3.141')
>>>d + 1
Decimal('4.141') >>>decimal.getcontext().prec = 2
>>>decimal.Decimal('1.00') / decimal.Decimal('3.00')
Decimal('0.33') >>>from fractions import Fraction
# Fractions: numerator+denominator
>>>f = Fraction(2, 3)
>>>f + 1
Fraction(5, 3) >>>f + Fraction(1, 2)
Fraction(7, 6)
Python also comes with Booleans (with predefined True
and False
objects that are essentially just the
integers 1 and 0 with custom display logic), and it has long supported a
special placeholder object called None
commonly used to initialize names and objects:
>>>1 > 2, 1 < 2
# Booleans
(False, True) >>>bool('spam')
# Object's Boolean value
True >>>X = None
# None placeholder
>>>print(X)
None >>>L = [None] * 100
# Initialize a list of 100 Nones
>>>L
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None,...a list of 100 Nones...
]
How to Break Your Code’s Flexibility
I’ll have more to say about all of Python’s object types later, but one merits special
treatment here. The type object, returned by the
type
built-in function, is an object
that gives the type of another object; its result differs slightly in
3.X, because types have merged with classes completely (something we’ll
explore in the context of “new-style” classes in Part VI). Assuming L
is still the list of the prior
section:
# In Python 2.X:
>>>type(L)
# Types: type of L is list type object
<type 'list'> >>>type(type(L))
# Even types are objects
<type 'type'># In Python 3.X:
>>>type(L)
# 3.X: types are classes, and vice versa
<class 'list'> >>>type(type(L))
# See Chapter 32 for more on class types
<class 'type'>
Besides allowing you to explore your objects interactively, the
type
object in its most practical
application allows code to check the types of the objects it processes.
In fact, there are at least three ways to do so in a Python
script:
>>>if type(L) == type([]):
# Type testing, if you must...
print('yes')
yes >>>if type(L) == list:
# Using the type name
print('yes')
yes >>>if isinstance(L, list):
# Object-oriented tests
print('yes')
yes
Now that I’ve shown you all these ways to do type testing, however, I am required by law to tell you that doing so is almost always the wrong thing to do in a Python program (and often a sign of an ex-C programmer first starting to use Python!). The reason why won’t become completely clear until later in the book, when we start writing larger code units such as functions, but it’s a (perhaps the) core Python concept. By checking for specific types in your code, you effectively break its flexibility—you limit it to working on just one type. Without such tests, your code may be able to work on a whole range of types.
This is related to the idea of polymorphism mentioned earlier, and it stems from Python’s lack of type declarations. As you’ll learn, in Python, we code to object interfaces (operations supported), not to types. That is, we care what an object does, not what it is. Not caring about specific types means that code is automatically applicable to many of them—any object with a compatible interface will work, regardless of its specific type. Although type checking is supported—and even required in some rare cases—you’ll see that it’s not usually the “Pythonic” way of thinking. In fact, you’ll find that polymorphism is probably the key idea behind using Python well.
User-Defined Classes
We’ll study object-oriented programming in Python—an optional but powerful feature of the language that cuts development time by supporting programming by customization—in depth later in this book. In abstract terms, though, classes define new types of objects that extend the core set, so they merit a passing glance here. Say, for example, that you wish to have a type of object that models employees. Although there is no such specific core type in Python, the following user-defined class might fit the bill:
>>>class Worker:
def __init__(self, name, pay):
# Initialize when created
self.name = name
# self is the new object
self.pay = pay
def lastName(self):
return self.name.split()[-1]
# Split string on blanks
def giveRaise(self, percent):
self.pay *= (1.0 + percent
)# Update pay in place
This class defines a new kind of object that will have name
and pay
attributes (sometimes called state information), as well as
two bits of behavior coded as functions (normally called
methods). Calling the class like a function
generates instances of our new type, and the class’s methods
automatically receive the instance being processed by a given method
call (in the self
argument):
>>>bob = Worker('Bob Smith', 50000)
# Make two instances
>>>sue = Worker('Sue Jones', 60000)
# Each has name and pay attrs
>>>bob.lastName()
# Call method: bob is self
'Smith' >>>sue.lastName()
# sue is the self subject
'Jones' >>>sue.giveRaise(.10)
# Updates sue's pay
>>>sue.pay
66000.0
The implied “self” object is why we call this an
object-oriented model: there is always an implied
subject in functions within a class. In a sense, though, the class-based
type simply builds on and uses core types—a user-defined Worker
object here, for example, is just a
collection of a string and a number (name
and pay
, respectively), plus functions for
processing those two built-in objects.
The larger story of classes is that their inheritance mechanism supports software hierarchies that lend themselves to customization by extension. We extend software by writing new classes, not by changing what already works. You should also know that classes are an optional feature of Python, and simpler built-in types such as lists and dictionaries are often better tools than user-coded classes. This is all well beyond the bounds of our introductory object-type tutorial, though, so consider this just a preview; for full disclosure on user-defined types coded with classes, you’ll have to read on. Because classes build upon other tools in Python, they are one of the major goals of this book’s journey.
And Everything Else
As mentioned earlier, everything you can process in a Python script is a type of object, so our object type tour is necessarily incomplete. However, even though everything in Python is an “object,” only those types of objects we’ve met so far are considered part of Python’s core type set. Other types in Python either are objects related to program execution (like functions, modules, classes, and compiled code), which we will study later, or are implemented by imported module functions, not language syntax. The latter of these also tend to have application-specific roles—text patterns, database interfaces, network connections, and so on.
Moreover, keep in mind that the objects we’ve met here are
objects, but not necessarily object-oriented—a
concept that usually requires inheritance and the Python class
statement, which we’ll meet again later in this book. Still, Python’s
core objects are the workhorses of almost every Python script you’re
likely to meet, and they usually are the basis of larger noncore
types.
Chapter Summary
And that’s a wrap for our initial data type tour. This chapter has offered a brief introduction to Python’s core object types and the sorts of operations we can apply to them. We’ve studied generic operations that work on many object types (sequence operations such as indexing and slicing, for example), as well as type-specific operations available as method calls (for instance, string splits and list appends). We’ve also defined some key terms, such as immutability, sequences, and polymorphism.
Along the way, we’ve seen that Python’s core object types are more flexible and powerful than what is available in lower-level languages such as C. For instance, Python’s lists and dictionaries obviate most of the work you do to support collections and searching in lower-level languages. Lists are ordered collections of other objects, and dictionaries are collections of other objects that are indexed by key instead of by position. Both dictionaries and lists may be nested, can grow and shrink on demand, and may contain objects of any type. Moreover, their space is automatically cleaned up as you go. We’ve also seen that strings and files work hand in hand to support a rich variety of binary and text data.
I’ve skipped most of the details here in order to provide a quick tour, so you shouldn’t expect all of this chapter to have made sense yet. In the next few chapters we’ll start to dig deeper, taking a second pass over Python’s core object types that will fill in details omitted here, and give you a deeper understanding. We’ll start off the next chapter with an in-depth look at Python numbers. First, though, here is another quiz to review.
Test Your Knowledge: Quiz
We’ll explore the concepts introduced in this chapter in more detail in upcoming chapters, so we’ll just cover the big ideas here:
Name four of Python’s core data types.
Why are they called “core” data types?
What does “immutable” mean, and which three of Python’s core types are considered immutable?
What does “sequence” mean, and which three types fall into that category?
What does “mapping” mean, and which core type is a mapping?
What is “polymorphism,” and why should you care?
Test Your Knowledge: Answers
Numbers, strings, lists, dictionaries, tuples, files, and sets are generally considered to be the core object (data) types. Types,
None
, and Booleans are sometimes classified this way as well. There are multiple number types (integer, floating point, complex, fraction, and decimal) and multiple string types (simple strings and Unicode strings in Python 2.X, and text strings and byte strings in Python 3.X).They are known as “core” types because they are part of the Python language itself and are always available; to create other objects, you generally must call functions in imported modules. Most of the core types have specific syntax for generating the objects:
'spam'
, for example, is an expression that makes a string and determines the set of operations that can be applied to it. Because of this, core types are hardwired into Python’s syntax. In contrast, you must call the built-inopen
function to create a file object (even though this is usually considered a core type too).An “immutable” object is an object that cannot be changed after it is created. Numbers, strings, and tuples in Python fall into this category. While you cannot change an immutable object in place, you can always make a new one by running an expression. Bytearrays in recent Pythons offer mutability for text, but they are not normal strings, and only apply directly to text if it’s a simple 8-bit kind (e.g., ASCII).
A “sequence” is a positionally ordered collection of objects. Strings, lists, and tuples are all sequences in Python. They share common sequence operations, such as indexing, concatenation, and slicing, but also have type-specific method calls. A related term, “iterable,” means either a physical sequence, or a virtual one that produces its items on request.
The term “mapping” denotes an object that maps keys to associated values. Python’s dictionary is the only mapping type in the core type set. Mappings do not maintain any left-to-right positional ordering; they support access to data stored by key, plus type-specific method calls.
“Polymorphism” means that the meaning of an operation (like a
+
) depends on the objects being operated on. This turns out to be a key idea (perhaps the key idea) behind using Python well—not constraining code to specific types makes that code automatically applicable to many types.
1 Pardon my formality. I’m a computer scientist.
2 In this book, the term literal simply means
an expression whose syntax generates an object—sometimes also called a
constant. Note that the term “constant” does not
imply objects or variables that can never be changed (i.e., this term
is unrelated to C++’s const
or
Python’s “immutable”—a topic explored in the section “Immutability”).
3 This matrix structure works for small-scale tasks, but for more serious number crunching you will probably want to use one of the numeric extensions to Python, such as the open source NumPy and SciPy systems. Such tools can store and process large matrixes much more efficiently than our nested list structure. NumPy has been said to turn Python into the equivalent of a free and more powerful version of the Matlab system, and organizations such as NASA, Los Alamos, JPL, and many others use this tool for scientific and financial tasks. Search the Web for more details.
4 Two application notes here. First, as a preview, the rec
record we just created really could be
an actual database record, when we employ Python’s object
persistence system—an easy way to store native Python
objects in simple files or access-by-key databases, which translates
objects to and from serial byte streams automatically. We won’t go
into details here, but watch for coverage of Python’s pickle
and shelve
persistence modules in Chapter 9, Chapter 28, Chapter 31, and Chapter 37, where we’ll explore them in
the context of files, an OOP use case, classes, and 3.X changes,
respectively.
Second, if you are familiar with JSON
(JavaScript Object Notation)—an emerging data-interchange format
used for databases and network transfers—this example may also look
curiously similar, though Python’s support for variables, arbitrary
expressions, and changes can make its data structures more general.
Python’s json
library module
supports creating and parsing JSON text, but the translation to
Python objects is often trivial. Watch for a JSON example that uses
this record in Chapter 9 when we study
files. For a larger use case, see MongoDB,
which stores data using a language-neutral binary-encoded
serialization of JSON-like documents, and its
PyMongo interface.
Get Learning Python, 5th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.