c++boost.gif (8819 bytes)

BPL Pickle Support

Pickle is a Python module for object serialization, also known as persistence, marshalling, or flattening.

It is often necessary to save and restore the contents of an object to a file. One approach to this problem is to write a pair of functions that read and write data from a file in a special format. A powerful alternative approach is to use Python's pickle module. Exploiting Python's ability for introspection, the pickle module recursively converts nearly arbitrary Python objects into a stream of bytes that can be written to a file.

The Boost Python Library supports the pickle module by emulating the interface implemented by Jim Fulton's ExtensionClass module that is included in the ZOPE distribution (http://www.zope.org/). This interface is similar to that for regular Python classes as described in detail in the Python Library Reference for pickle:

http://www.python.org/doc/current/lib/module-pickle.html

The BPL Pickle Interface

At the user level, the BPL pickle interface involves three special methods:
__getinitargs__
When an instance of a BPL extension class is pickled, the pickler tests if the instance has a __getinitargs__ method. This method must return a Python tuple. When the instance is restored by the unpickler, the contents of this tuple are used as the arguments for the class constructor.

If __getinitargs__ is not defined, the class constructor will be called without arguments.

__getstate__
When an instance of a BPL extension class is pickled, the pickler tests if the instance has a __getstate__ method. This method should return a Python object representing the state of the instance.

If __getstate__ is not defined, the instance's __dict__ is pickled (if it is not empty).

__setstate__
When an instance of a BPL extension class is restored by the unpickler, it is first constructed using the result of __getinitargs__ as arguments (see above). Subsequently the unpickler tests if the new instance has a __setstate__ method. If so, this method is called with the result of __getstate__ (a Python object) as the argument.

If __setstate__ is not defined, the result of __getstate__ must be a Python dictionary. The items of this dictionary are added to the instance's __dict__.

If both __getstate__ and __setstate__ are defined, the Python object returned by __getstate__ need not be a dictionary. The __getstate__ and __setstate__ methods can do what they want.

Pitfalls and Safety Guards

In BPL extension modules with many extension classes, providing complete pickle support for all classes would be a significant overhead. In general complete pickle support should only be implemented for extension classes that will eventually be pickled. However, the author of a BPL extension module might not anticipate correctly which classes need support for pickle. Unfortunately, the pickle protocol described above has two important pitfalls that the end user of a BPL extension module might not be aware of:
Pitfall 1: Both __getinitargs__ and __getstate__ are not defined.
In this situation the unpickler calls the class constructor without arguments and then adds the __dict__ that was pickled by default to that of the new instance.

However, most C++ classes wrapped with the BPL will have member data that are not restored correctly by this procedure. To alert the user to this problem, a safety guard is provided. If both __getinitargs__ and __getstate__ are not defined, the BPL tests if the class has an attribute __dict_defines_state__. An exception is raised if this attribute is not defined:

    RuntimeError: Incomplete pickle support (__dict_defines_state__ not set)
In the rare cases where this is not the desired behavior, the safety guard can deliberately be disabled. The corresponding C++ code for this is, e.g.:
    class_builder py_your_class(your_module, "your_class");
    py_your_class.dict_defines_state();
It is also possible to override the safety guard at the Python level. E.g.:
    import your_bpl_module
    class your_class(your_bpl_module.your_class):
      __dict_defines_state__ = 1

Pitfall 2: __getstate__ is defined and the instance's __dict__ is not empty.
The author of a BPL extension class might provide a __getstate__ method without considering the possibilities that:

To alert the user to this highly unobvious problem, a safety guard is provided. If __getstate__ is defined and the instance's __dict__ is not empty, the BPL tests if the class has an attribute __getstate_manages_dict__. An exception is raised if this attribute is not defined:

    RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set)
To resolve this problem, it should first be established that the __getstate__ and __setstate__ methods manage the instances's __dict__ correctly. Note that this can be done both at the C++ and the Python level. Finally, the safety guard should intentionally be overridden. E.g. in C++:
    class_builder py_your_class(your_module, "your_class");
    py_your_class.getstate_manages_dict();
In Python:
    import your_bpl_module
    class your_class(your_bpl_module.your_class):
      __getstate_manages_dict__ = 1
      def __getstate__(self):
        # your code here
      def __setstate__(self, state):
        # your code here

Practical Advice


Author: Ralf W. Grosse-Kunstleve, March 2001