Credit: Luther Blissett
Often, you need no special precautions to use
cPickle
on your classes and their instances. For
example, the following works fine:
import cPickle class ForExample: def _ _init_ _(self, *stuff): self.stuff = stuff anInstance = ForExample('one', 2, 3) saved = cPickle.dumps(anInstance) reloaded = cPickle.loads(saved) assert anInstance.stuff == reloaded.stuff
However, sometimes there are problems:
anotherInstance = ForExample(1, 2, open('three', 'w')) wontWork = cPickle.dumps(anotherInstance)
This causes a TypeError
: “can’t
pickle file objects exception”, because the state of
anotherInstance
includes a file object, and file
objects cannot be pickled. You would get exactly the same exception
if you tried to pickle any other container that includes a file
object among its items.
However, in some cases, you may be able to do something about it:
import types class PrettyClever: def _ _init_ _(self, *stuff): self.stuff = stuff def _ _getstate_ _(self): def normalize(x): if type(x) == types.FileType: return 1, (x.name, x.mode, x.tell( )) return 0, x return [ normalize(x) for x in self.stuff ] def _ _setstate_ _(self, stuff): def reconstruct(x): if x[0] == 0: return x[1] name, mode, offs = x[1] openfile = open(name, mode) openfile.seek(offs) return openfile self.stuff = tuple([reconstruct(x) for x in stuff])
By defining the _ _getstate_ _
and _ _setstate_ _
special methods in your class, you gain
fine-grained control about what, exactly, your
class’s instances consider to be their state. As
long as you can define such “state”
in picklable terms, and reconstruct your instances from the unpickled
state sufficiently for your application, you can make your instances
themselves picklable and unpicklable in this way.
cPickle
dumps class and function objects by name
(i.e., through their module’s name and their name
within the module). Thus, you can dump only classes defined at module
level (not inside other classes and functions). Reloading such
objects requires the respective modules to be available for
import
. Instances can be saved and reloaded only
if they belong to such classes. In addition, the
instance’s state must also be picklable.
By default, an instance’s
state is the contents of its _ _dict_ _
plus, in Python 2.2, whatever state it
may get from the built-in type it inherits from. (For example, an
instance of a new-style class that subclasses list
includes the list items as part of the instance’s
state. Also, in Python 2.2, cPickle
supports
_ _slots_ _
if an object
and/or its bases define them, instead of using _ _dict_ _
, the default way, to hold per-instance state). This
default approach is often quite sufficient and satisfactory.
Sometimes, however, you may have nonpicklable attributes or items as
part of your instance’s state (as
cPickle
defines it by default). In this recipe,
for example, I show a class whose instances hold arbitrary stuff,
which may include open file objects. To handle this case, your class
can define the special method _ _getstate_ _
.
cPickle
calls that method on your object, if your
object’s class defines it or inherits it, instead of
going directly for the object’s _ _dict_ _
(or possibly _ _slots_ _
and/or
built-in type bases in Python 2.2).
Normally, when you define the _ _getstate_ _
method, you define the _ _setstate_ _
method as well, as
shown in the solution. _ _getstate_ _
can return
any picklable object, and that same object would then be passed as
_ _setstate_ _
’s argument. In the
solution, _ _getstate_ _
returns a list
that’s similar to the instance’s
default state self.stuff
, except that each item is
turned into a tuple of two items. The first item in the pair can be
set to 0
to indicate that the second one will be
taken verbatim, or 1
to indicate that the second
item will be used to reconstruct an open file. (Of course, the
reconstruction may fail or be unsatisfactory in several ways. There
is no general way to save an open file’s state,
which is why cPickle
itself
doesn’t even try. But suppose that in the context of
our application we know the given approach will work.) When reloading
the instance from pickled form, cPickle
will call
_ _setstate_ _
with the list of pairs, and
_ _setstate_ _
can reconstruct
self.stuff
by processing each pair appropriately
in its nested reconstruct
function. This scheme
clearly generalizes to getting and restoring state that may contain
various kinds of normally unpicklable objects—just be sure to
use different numbers to tag various kinds of nonverbatim pairs.
In a particular case, you can define _ _getstate_ _
without defining _ _setstate_ _
.
_ _getstate_ _
must return a dictionary, and
reloading the instance from pickled form uses that dictionary just as
the instance’s _ _dict_ _
would
normally be used. Not running your own code at reloading time is a
serious hindrance, but it may come in handy when you want to use
_ _getstate_ _
, not to save otherwise unpicklable
state, but rather as an optimization. Typically, this happens when
your instance caches results that it may recompute if
they’re absent, and you decide it’s
best not to store the cache as a part of the
instance’s state. In this case, you should define
_ _getstate_ _
to return a dictionary
that’s the indispensable subset of the
instance’s _ _dict_ _
.
With either the default pickling/unpickling approach or your own
_ _getstate_ _
and _ _setstate_ _
, the instance’s special method
_ _init_ _
is not called. If the most convenient
way for you to reconstruct an instance is to call the _ _init_ _
method with appropriate parameters, then instead
of _ _getstate_ _
, you may want to define the
special method _ _getinitargs_ _
. In this case,
cPickle
calls this method without arguments: the
method must return a tuple
, and
cPickle
calls _ _init_ _
at
reloading time with the arguments that are that
tuple’s items.
The Library Reference for the
pickle
and copy_reg
modules
details even subtler things you can do when pickling and unpickling,
as well as security issues that come from unpickling data from
untrusted sources. However, the techniques I’ve
discussed here should suffice in almost all practical cases, as long
as the security aspects of unpickling are not a problem. As a further
practical advantage, if you define _ _getstate_ _
(and then, typically, _ _setstate_ _
) or
_ _getinitargs_ _
, in addition to being used for
pickling and unpickling your class’s instances,
they’ll be used by the functions in the
copy
module that perform shallow and deep copies
of your objects (the copy
and
deepcopy
functions, respectively). The issues of
extracting and restoring instance state are almost the same when
copying the instance directly, as when serializing (saving) it to a
string (or file, e.g.) and then restoring it, which can be seen as
just one way to copy it at a later time and/or in another machine.
Recipe 8.3; documentation for the standard
library module cPickle
in the Library Reference.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.