Credit: Luther Blissett
You have a Python data structure composed of only fundamental Python objects (e.g., lists, tuples, numbers, and strings, but no classes, instances, etc.), and you want to serialize it and reconstruct it later as fast as possible.
If you know that your data is composed entirely of fundamental Python
objects, the lowest-level, fastest approach to serializing it (i.e.,
turning it into a string of bytes and later reconstructing it from
such a string) is via the marshal
module. Suppose
that data
is composed of only elementary Python
data types. For example:
data = {12:'twelve', 'feep':list('ciao'), 1.23:4+5j, (1,2,3):u'wer'}
You can serialize data
to a byte string at top
speed as follows:
import marshal bytes = marshal.dumps(data)
You
can now sling bytes
around as you wish (e.g., send
it across a network, put it as a BLOB in a database, etc.), as long
as you keep its arbitrary binary bytes intact. Then you can
reconstruct the data any time you’d like:
redata = marshal.loads(bytes)
This reconstructs a
data structure that compares equal (==
) to
data
. In other words, the order of keys in
dictionaries is arbitrary in both the original and reconstructed data
structures, but order in any kind of sequence is meaningful, and thus
it is preserved. Note that loads
works
independently of machine architecture, but you must guarantee that it
is used by the same release of Python under which
bytes
was originally generated via
dumps
.
When you specifically want to write the data to a disk file, as long
as the latter is open for binary (not the default text mode)
input/output, you can also use the dump
function
of the marshal
module, which lets you dump several
data structures one after the other:
ouf = open('datafile.dat', 'wb') marshal.dump(data, ouf) marshal.dump('some string', ouf) marshal.dump(range(19), ouf) ouf.close( )
When you have done this, you can recover from
datafile.dat
the same data structures you dumped
into it, in the same sequence:
inf = open('datafile.dat', 'rb') a = marshal.load(inf) b = marshal.load(inf) c = marshal.load(inf) inf.close( )
Python offers several ways to serialize data (i.e., make the data
into a string of bytes that you can save on disk, in a database, send
across the network, and so on) and corresponding ways to reconstruct
the data from such serialized forms. The lowest-level approach is to
use the marshal
module, which Python uses to write
its bytecode files. marshal
supports only
elementary data types (e.g., dictionaries, lists, tuples, numbers,
and strings) and combinations thereof.
marshal
does not guarantee compatibility
from one Python release to another, so data serialized with
marshal
may not be readable if you upgrade your
Python release. However, it does guarantee independence from a
specific machine’s architecture, so it is guaranteed
to work if you’re sending serialized data between
different machines, as long as they are all running the same version
of Python—similar to how you can share compiled Python bytecode
files in such a distributed setting.
marshal
’s
dumps
function accepts any Python data structure
and returns a byte string representing it. You can pass that byte
string to the loads
function, which will return
another Python data structure that compares equal
(==
) to the one you originally dumped. In between
the dumps
and loads
calls, you
can subject the byte string to any procedure you wish, such as
sending it over the network, storing it into a database and
retrieving it, or encrypting it and decrypting it. As long as the
string’s binary structure is correctly restored,
loads
will work fine on it (again, as long as it
is under the same Python release with which you originally executed
dumps
).
When you specifically need to save the data to a file, you can also
use marshal
’s
dump
function, which takes two arguments: the data
structure you’re dumping and the open file object.
Note that the file must be opened for binary I/O (not the default,
which is text I/O) and can’t be a file-like object,
as marshal
is quite picky about it being a true
file. The advantage of dump
is that you can
perform several calls to dump
with various data
structures and the same open file object: each data structure is then
dumped together with information about how long the dumped byte
string is. As a consequence, when you later open the file for binary
reading and then call marshal.load
, passing the
file as the argument, each previously dumped data structure is
reloaded sequentially. The return value of load
,
like that of loads
, is a new data structure that
compares equal to the one you originally dumped.
Recipe 4.27; Recipe 8.3 for cPickle
, the big
brother of marshal
; documentation on the
marshal
standard library module in the
Library Reference.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.