Credit: Luther Blissett
You have a Python data structure, which may include fundamental Python objects, and possibly classes and instances, and you want to serialize it and reconstruct it at a reasonable speed.
If you don’t want to assume that your data is
composed of only fundamental Python objects, or you need portability
across versions of Python, or you need to transmit the serialized
form as text, the best way of serializing your data is with the
cPickle
module (the pickle
module is a pure-Python equivalent, but it’s far
slower and not worth using except if you’re missing
cPickle
). For example:
data = {12:'twelve', 'feep':list('ciao'), 1.23:4+5j, (1,2,3):u'wer'}
You can serialize data
to a text string:
import cPickle text = cPickle.dumps(data)
or to a binary string, which is faster and takes up less space:
bytes = cPickle.dumps(data, 1)
You can now sling text
or bytes
around as you wish (e.g., send it across a network, put it as a BLOB
in a database, etc.), as long as you keep it intact. In the case of
bytes
, this means keeping its arbitrary binary
bytes intact. In the case of text
, this means
keeping its textual structure intact, including newline characters.
Then you can
reconstruct the data
at any time, regardless of machine architecture or Python release:
redata1 = cPickle.loads(text) redata2 = cPickle.loads(bytes)
Either call reconstructs a data structure that compares equal to
data
. In other words, the order of keys in
dictionaries is arbitrary in both the original and reconstructed data
structures, but order in any kind of sequence is meaningful, and thus
it is preserved. You don’t need to tell
cPickle.loads
whether the original
dumps
used text mode (the default) or binary
(faster and more compact)—loads
figures it
out by examining its argument’s contents.
When
you specifically want to write the data to a file, you can also use
the dump
function of the
cPickle
module, which lets you dump several data
structures one after the other:
ouf = open('datafile.txt', 'w') cPickle.dump(data, ouf) cPickle.dump('some string', ouf) cPickle.dump(range(19), ouf) ouf.close( )
Once you have done this, you can recover from
datafile.txt
the same data structures you dumped
into it, in the same sequence:
inf = open('datafile.txt') a = cPickle.load(inf) b = cPickle.load(inf) c = cPickle.load(inf) inf.close( )
You can also pass cPickle.dump
a third argument of
1
to tell it to serialize the data in binary form
(faster and more compact), but the datafile must be opened for binary
I/O, not in the default text mode, when you originally dump to it and
when you later load from it.
Python offers several ways to serialize data (i.e., make the data
into a string of bytes that you can save on disk, in a database, send
across the network, and so on) and corresponding ways to reconstruct
the data from such serialized forms. Typically, the best approach is
to use the cPickle
module. There is also a
pure-Python equivalent, called
pickle
(the
cPickle
module is coded in C as a Python
extension), but pickle
is substantially slower,
and the only reason to use it is if you don’t have
cPickle
(e.g., a Python port onto a handheld
computer with tiny storage space, where you saved every byte you
possibly could by installing only an indispensable subset of
Python’s large standard library).
cPickle
supports most elementary data types (e.g.,
dictionaries, lists, tuples, numbers, strings) and combinations
thereof, as well as classes and instances. Pickling classes and
instances saves only the data involved, not the code. (Code objects
are not even among the types that cPickle
knows
how to serialize, basically because there would be no way to
guarantee their portability across disparate versions of Python). See
Recipe 8.4 for more about pickling classes
and instances.
cPickle
guarantees compatibility from
one Python release to another and independence from a specific
machine’s architecture. Data serialized with
cPickle
will still be readable if you upgrade your
Python release, and pickling is guaranteed to work if
you’re sending serialized data between different
machines.
The dumps
function of cPickle
accepts any Python data structure and returns a text string
representing it. Or, if you call dumps
with a
second argument of 1
, it returns an arbitrary byte
string instead, which is faster and takes up less space. You can pass
either the text or the byte string to the loads
function, which will return another Python data structure that
compares equal (==
) to the one you originally
dumped. In between the dumps
and
loads
calls, you can subject the byte string to
any procedure you wish, such as sending it over the network, storing
it in a database and retrieving it, or encrypting it and decrypting
it. As long as the string’s textual or binary
structure is correctly restored, loads
will work
fine on it (even across platforms and releases).
When you specifically need to save the data into a file, you can also
use cPickle
’s
dump
function, which takes two arguments: the data
structure you’re dumping and the open file object.
If the file is opened for binary I/O,
rather than the default (text I/O), by giving dump
a third argument of 1
, you can ask for binary
format, which is faster and takes up less space.
The
advantage of dump
over dumps
is
that, with dump
, you can perform several calls,
one after the other, with various data structures and the same open
file object. Each data structure is then dumped with information
about how long the dumped string is. Consequently, when you later
open the file for reading (binary reading, if you asked for binary
format), and then repeatedly call cPickle.load
,
passing the file as the argument, each data structure previously
dumped is reloaded sequentially, one after the other. The return
value of load
, as that of
loads
, is a new data structure that compares equal
to the one you originally dumped.
Recipe 8.2 and Recipe 8.4; documentation for the standard library module
cPickle
in the Library Reference.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.