Using the cPickle Module on Classes and Instances

Credit: Luther Blissett

Problem

You want to save and restore class and instance objects using the cPickle module.

Solution

Often, you need no special precautions to use cPickle on your classes and their instances. For example, the following works fine:

import cPickle

class ForExample:
    def _ _init_ _(self, *stuff): self.stuff = stuff
anInstance = ForExample('one', 2, 3)
saved = cPickle.dumps(anInstance)
reloaded = cPickle.loads(saved)
assert anInstance.stuff == reloaded.stuff

However, sometimes there are problems:

anotherInstance = ForExample(1, 2, open('three', 'w'))
wontWork = cPickle.dumps(anotherInstance)

This causes a TypeError: “can’t pickle file objects exception”, because the state of anotherInstance includes a file object, and file objects cannot be pickled. You would get exactly the same exception if you tried to pickle any other container that includes a file object among its items.

However, in some cases, you may be able to do something about it:

import types
class PrettyClever:
    def _ _init_ _(self, *stuff): self.stuff = stuff
    def _ _getstate_ _(self):
        def normalize(x):
            if type(x) == types.FileType:
                return 1, (x.name, x.mode, x.tell(  ))
            return 0, x
        return [ normalize(x) for x in self.stuff ]
    def _ _setstate_ _(self, stuff):
        def reconstruct(x):
            if x[0] == 0:
                return x[1]
            name, mode, offs = x[1]
            openfile = open(name, mode)
            openfile.seek(offs)
            return openfile
        self.stuff = tuple([reconstruct(x) for x in stuff])

By defining the _ _getstate_ _ and _ _setstate_ _ special methods in your class, you gain fine-grained control about what, exactly, your class’s instances consider to be their state. As long as you can define such “state” in picklable terms, and reconstruct your instances from the unpickled state sufficiently for your application, you can make your instances themselves picklable and unpicklable in this way.

Discussion

cPickle dumps class and function objects by name (i.e., through their module’s name and their name within the module). Thus, you can dump only classes defined at module level (not inside other classes and functions). Reloading such objects requires the respective modules to be available for import. Instances can be saved and reloaded only if they belong to such classes. In addition, the instance’s state must also be picklable.

By default, an instance’s state is the contents of its _ _dict_ _ plus, in Python 2.2, whatever state it may get from the built-in type it inherits from. (For example, an instance of a new-style class that subclasses list includes the list items as part of the instance’s state. Also, in Python 2.2, cPickle supports _ _slots_ _ if an object and/or its bases define them, instead of using _ _dict_ _, the default way, to hold per-instance state). This default approach is often quite sufficient and satisfactory.

Sometimes, however, you may have nonpicklable attributes or items as part of your instance’s state (as cPickle defines it by default). In this recipe, for example, I show a class whose instances hold arbitrary stuff, which may include open file objects. To handle this case, your class can define the special method _ _getstate_ _. cPickle calls that method on your object, if your object’s class defines it or inherits it, instead of going directly for the object’s _ _dict_ _ (or possibly _ _slots_ _ and/or built-in type bases in Python 2.2).

Normally, when you define the _ _getstate_ _ method, you define the _ _setstate_ _ method as well, as shown in the solution. _ _getstate_ _ can return any picklable object, and that same object would then be passed as _ _setstate_ _’s argument. In the solution, _ _getstate_ _ returns a list that’s similar to the instance’s default state self.stuff, except that each item is turned into a tuple of two items. The first item in the pair can be set to 0 to indicate that the second one will be taken verbatim, or 1 to indicate that the second item will be used to reconstruct an open file. (Of course, the reconstruction may fail or be unsatisfactory in several ways. There is no general way to save an open file’s state, which is why cPickle itself doesn’t even try. But suppose that in the context of our application we know the given approach will work.) When reloading the instance from pickled form, cPickle will call _ _setstate_ _ with the list of pairs, and _ _setstate_ _ can reconstruct self.stuff by processing each pair appropriately in its nested reconstruct function. This scheme clearly generalizes to getting and restoring state that may contain various kinds of normally unpicklable objects—just be sure to use different numbers to tag various kinds of nonverbatim pairs.

In a particular case, you can define _ _getstate_ _ without defining _ _setstate_ _. _ _getstate_ _ must return a dictionary, and reloading the instance from pickled form uses that dictionary just as the instance’s _ _dict_ _ would normally be used. Not running your own code at reloading time is a serious hindrance, but it may come in handy when you want to use _ _getstate_ _, not to save otherwise unpicklable state, but rather as an optimization. Typically, this happens when your instance caches results that it may recompute if they’re absent, and you decide it’s best not to store the cache as a part of the instance’s state. In this case, you should define _ _getstate_ _ to return a dictionary that’s the indispensable subset of the instance’s _ _dict_ _.

With either the default pickling/unpickling approach or your own _ _getstate_ _ and _ _setstate_ _, the instance’s special method _ _init_ _ is not called. If the most convenient way for you to reconstruct an instance is to call the _ _init_ _ method with appropriate parameters, then instead of _ _getstate_ _, you may want to define the special method _ _getinitargs_ _. In this case, cPickle calls this method without arguments: the method must return a tuple, and cPickle calls _ _init_ _ at reloading time with the arguments that are that tuple’s items.

The Library Reference for the pickle and copy_reg modules details even subtler things you can do when pickling and unpickling, as well as security issues that come from unpickling data from untrusted sources. However, the techniques I’ve discussed here should suffice in almost all practical cases, as long as the security aspects of unpickling are not a problem. As a further practical advantage, if you define _ _getstate_ _ (and then, typically, _ _setstate_ _) or _ _getinitargs_ _, in addition to being used for pickling and unpickling your class’s instances, they’ll be used by the functions in the copy module that perform shallow and deep copies of your objects (the copy and deepcopy functions, respectively). The issues of extracting and restoring instance state are almost the same when copying the instance directly, as when serializing (saving) it to a string (or file, e.g.) and then restoring it, which can be seen as just one way to copy it at a later time and/or in another machine.

See Also

Recipe 8.3; documentation for the standard library module cPickle in the Library Reference.

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.