diff --git a/doc/PyConDC_2003/bpl.html b/doc/PyConDC_2003/bpl.html new file mode 100755 index 00000000..a9ecebd5 --- /dev/null +++ b/doc/PyConDC_2003/bpl.html @@ -0,0 +1,1127 @@ + + + + + + +Building Hybrid Systems with Boost.Python + + + + + + + + +
+

Building Hybrid Systems with Boost.Python

+ +++ + + + + + + + + + + + + + +
Author:David Abrahams
Contact:dave@boost-consulting.com
Organization:Boost Consulting
Date:2003-03-19
Author:Ralf W. Grosse-Kunstleve
Copyright:Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved
+
+

Table of Contents

+ +
+
+

Abstract

+

Boost.Python is an open source C++ library which provides a concise +IDL-like interface for binding C++ classes and functions to +Python. Leveraging the full power of C++ compile-time introspection +and of recently developed metaprogramming techniques, this is achieved +entirely in pure C++, without introducing a new syntax. +Boost.Python's rich set of features and high-level interface make it +possible to engineer packages from the ground up as hybrid systems, +giving programmers easy and coherent access to both the efficient +compile-time polymorphism of C++ and the extremely convenient run-time +polymorphism of Python.

+
+
+

Introduction

+

Python and C++ are in many ways as different as two languages could +be: while C++ is usually compiled to machine-code, Python is +interpreted. Python's dynamic type system is often cited as the +foundation of its flexibility, while in C++ static typing is the +cornerstone of its efficiency. C++ has an intricate and difficult +compile-time meta-language, while in Python, practically everything +happens at runtime.

+

Yet for many programmers, these very differences mean that Python and +C++ complement one another perfectly. Performance bottlenecks in +Python programs can be rewritten in C++ for maximal speed, and +authors of powerful C++ libraries choose Python as a middleware +language for its flexible system integration capabilities. +Furthermore, the surface differences mask some strong similarities:

+ +

Given Python's rich 'C' interoperability API, it should in principle +be possible to expose C++ type and function interfaces to Python with +an analogous interface to their C++ counterparts. However, the +facilities provided by Python alone for integration with C++ are +relatively meager. Compared to C++ and Python, 'C' has only very +rudimentary abstraction facilities, and support for exception-handling +is completely missing. 'C' extension module writers are required to +manually manage Python reference counts, which is both annoyingly +tedious and extremely error-prone. Traditional extension modules also +tend to contain a great deal of boilerplate code repetition which +makes them difficult to maintain, especially when wrapping an evolving +API.

+

These limitations have lead to the development of a variety of wrapping +systems. SWIG is probably the most popular package for the +integration of C/C++ and Python. A more recent development is SIP, +which was specifically designed for interfacing Python with the Qt +graphical user interface library. Both SWIG and SIP introduce their +own specialized languages for customizing inter-language bindings. +This has certain advantages, but having to deal with three different +languages (Python, C/C++ and the interface language) also introduces +practical and mental difficulties. The CXX package demonstrates an +interesting alternative. It shows that at least some parts of +Python's 'C' API can be wrapped and presented through a much more +user-friendly C++ interface. However, unlike SWIG and SIP, CXX does +not include support for wrapping C++ classes as new Python types.

+

The features and goals of Boost.Python overlap significantly with +many of these other systems. That said, Boost.Python attempts to +maximize convenience and flexibility without introducing a separate +wrapping language. Instead, it presents the user with a high-level +C++ interface for wrapping C++ classes and functions, managing much of +the complexity behind-the-scenes with static metaprogramming. +Boost.Python also goes beyond the scope of earlier systems by +providing:

+ +

The key insight that sparked the development of Boost.Python is that +much of the boilerplate code in traditional extension modules could be +eliminated using C++ compile-time introspection. Each argument of a +wrapped C++ function must be extracted from a Python object using a +procedure that depends on the argument type. Similarly the function's +return type determines how the return value will be converted from C++ +to Python. Of course argument and return types are part of each +function's type, and this is exactly the source from which +Boost.Python deduces most of the information required.

+

This approach leads to user guided wrapping: as much information is +extracted directly from the source code to be wrapped as is possible +within the framework of pure C++, and some additional information is +supplied explicitly by the user. Mostly the guidance is mechanical +and little real intervention is required. Because the interface +specification is written in the same full-featured language as the +code being exposed, the user has unprecedented power available when +she does need to take control.

+
+
+

Boost.Python Design Goals

+

The primary goal of Boost.Python is to allow users to expose C++ +classes and functions to Python using nothing more than a C++ +compiler. In broad strokes, the user experience should be one of +directly manipulating C++ objects from Python.

+

However, it's also important not to translate all interfaces too +literally: the idioms of each language must be respected. For +example, though C++ and Python both have an iterator concept, they are +expressed very differently. Boost.Python has to be able to bridge the +interface gap.

+

It must be possible to insulate Python users from crashes resulting +from trivial misuses of C++ interfaces, such as accessing +already-deleted objects. By the same token the library should +insulate C++ users from low-level Python 'C' API, replacing +error-prone 'C' interfaces like manual reference-count management and +raw PyObject pointers with more-robust alternatives.

+

Support for component-based development is crucial, so that C++ types +exposed in one extension module can be passed to functions exposed in +another without loss of crucial information like C++ inheritance +relationships.

+

Finally, all wrapping must be non-intrusive, without modifying or +even seeing the original C++ source code. Existing C++ libraries have +to be wrappable by third parties who only have access to header files +and binaries.

+
+
+

Hello Boost.Python World

+

And now for a preview of Boost.Python, and how it improves on the raw +facilities offered by Python. Here's a function we might want to +expose:

+
+char const* greet(unsigned x)
+{
+   static char const* const msgs[] = { "hello", "Boost.Python", "world!" };
+
+   if (x > 2) 
+       throw std::range_error("greet: index out of range");
+
+   return msgs[x];
+}
+
+

To wrap this function in standard C++ using the Python 'C' API, we'd +need something like this:

+
+extern "C" // all Python interactions use 'C' linkage and calling convention
+{
+    // Wrapper to handle argument/result conversion and checking
+    PyObject* greet_wrap(PyObject* args, PyObject * keywords)
+    {
+         int x;
+         if (PyArg_ParseTuple(args, "i", &x))    // extract/check arguments
+         {
+             char const* result = greet(x);      // invoke wrapped function
+             return PyString_FromString(result); // convert result to Python
+         }
+         return 0;                               // error occurred
+    }
+
+    // Table of wrapped functions to be exposed by the module
+    static PyMethodDef methods[] = {
+        { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" }
+        , { NULL, NULL, 0, NULL } // sentinel
+    };
+
+    // module initialization function
+    DL_EXPORT init_hello()
+    {
+        (void) Py_InitModule("hello", methods); // add the methods to the module
+    }
+}
+
+

Now here's the wrapping code we'd use to expose it with Boost.Python:

+
+#include <boost/python.hpp>
+using namespace boost::python;
+BOOST_PYTHON_MODULE(hello)
+{
+    def("greet", greet, "return one of 3 parts of a greeting");
+}
+
+

and here it is in action:

+
+>>> import hello
+>>> for x in range(3):
+...     print hello.greet(x)
+...
+hello
+Boost.Python
+world!
+
+

Aside from the fact that the 'C' API version is much more verbose, +it's worth noting a few things that it doesn't handle correctly:

+ +
+
+

Library Overview

+

This section outlines some of the library's major features. Except as +neccessary to avoid confusion, details of library implementation are +omitted.

+
+

Exposing Classes

+

C++ classes and structs are exposed with a similarly-terse interface. +Given:

+
+struct World
+{
+    void set(std::string msg) { this->msg = msg; }
+    std::string greet() { return msg; }
+    std::string msg;
+};
+
+

The following code will expose it in our extension module:

+
+#include <boost/python.hpp>
+BOOST_PYTHON_MODULE(hello)
+{
+    class_<World>("World")
+        .def("greet", &World::greet)
+        .def("set", &World::set)
+    ;
+}
+
+

Although this code has a certain pythonic familiarity, people +sometimes find the syntax bit confusing because it doesn't look like +most of the C++ code they're used to. All the same, this is just +standard C++. Because of their flexible syntax and operator +overloading, C++ and Python are great for defining domain-specific +(sub)languages +(DSLs), and that's what we've done in Boost.Python. To break it down:

+
+class_<World>("World")
+
+

constructs an unnamed object of type class_<World> and passes +"World" to its constructor. This creates a new-style Python class +called World in the extension module, and associates it with the +C++ type World in the Boost.Python type conversion registry. We +might have also written:

+
+class_<World> w("World");
+
+

but that would've been more verbose, since we'd have to name w +again to invoke its def() member function:

+
+w.def("greet", &World::greet)
+
+

There's nothing special about the location of the dot for member +access in the original example: C++ allows any amount of whitespace on +either side of a token, and placing the dot at the beginning of each +line allows us to chain as many successive calls to member functions +as we like with a uniform syntax. The other key fact that allows +chaining is that class_<> member functions all return a reference +to *this.

+

So the example is equivalent to:

+
+class_<World> w("World");
+w.def("greet", &World::greet);
+w.def("set", &World::set);
+
+

It's occasionally useful to be able to break down the components of a +Boost.Python class wrapper in this way, but the rest of this article +will stick to the terse syntax.

+

For completeness, here's the wrapped class in use:

+
+>>> import hello
+>>> planet = hello.World()
+>>> planet.set('howdy')
+>>> planet.greet()
+'howdy'
+
+
+

Constructors

+

Since our World class is just a plain struct, it has an +implicit no-argument (nullary) constructor. Boost.Python exposes the +nullary constructor by default, which is why we were able to write:

+
+>>> planet = hello.World()
+
+

However, well-designed classes in any language may require constructor +arguments in order to establish their invariants. Unlike Python, +where __init__ is just a specially-named method, In C++ +constructors cannot be handled like ordinary member functions. In +particular, we can't take their address: &World::World is an +error. The library provides a different interface for specifying +constructors. Given:

+
+struct World
+{
+    World(std::string msg); // added constructor
+    ...
+
+

we can modify our wrapping code as follows:

+
+class_<World>("World", init<std::string>())
+    ...
+
+

of course, a C++ class may have additional constructors, and we can +expose those as well by passing more instances of init<...> to +def():

+
+class_<World>("World", init<std::string>())
+    .def(init<double, double>())
+    ...
+
+

Boost.Python allows wrapped functions, member functions, and +constructors to be overloaded to mirror C++ overloading.

+
+
+

Data Members and Properties

+

Any publicly-accessible data members in a C++ class can be easily +exposed as either readonly or readwrite attributes:

+
+class_<World>("World", init<std::string>())
+    .def_readonly("msg", &World::msg)
+    ...
+
+

and can be used directly in Python:

+
+>>> planet = hello.World('howdy')
+>>> planet.msg
+'howdy'
+
+

This does not result in adding attributes to the World instance +__dict__, which can result in substantial memory savings when +wrapping large data structures. In fact, no instance __dict__ +will be created at all unless attributes are explicitly added from +Python. Boost.Python owes this capability to the new Python 2.2 type +system, in particular the descriptor interface and property type.

+

In C++, publicly-accessible data members are considered a sign of poor +design because they break encapsulation, and style guides usually +dictate the use of "getter" and "setter" functions instead. In +Python, however, __getattr__, __setattr__, and since 2.2, +property mean that attribute access is just one more +well-encapsulated syntactic tool at the programmer's disposal. +Boost.Python bridges this idiomatic gap by making Python property +creation directly available to users. If msg were private, we +could still expose it as attribute in Python as follows:

+
+class_<World>("World", init<std::string>())
+    .add_property("msg", &World::greet, &World::set)
+    ...
+
+

The example above mirrors the familiar usage of properties in Python +2.2+:

+
+>>> class World(object):
+...     __init__(self, msg):
+...         self.__msg = msg
+...     def greet(self):
+...         return self.__msg
+...     def set(self, msg):
+...         self.__msg = msg
+...     msg = property(greet, set)
+
+
+
+

Operator Overloading

+

The ability to write arithmetic operators for user-defined types has +been a major factor in the success of both languages for numerical +computation, and the success of packages like NumPy attests to the +power of exposing operators in extension modules. Boost.Python +provides a concise mechanism for wrapping operator overloads. The +example below shows a fragment from a wrapper for the Boost rational +number library:

+
+class_<rational<int> >("rational_int")
+  .def(init<int, int>()) // constructor, e.g. rational_int(3,4)
+  .def("numerator", &rational<int>::numerator)
+  .def("denominator", &rational<int>::denominator)
+  .def(-self)        // __neg__ (unary minus)
+  .def(self + self)  // __add__ (homogeneous)
+  .def(self * self)  // __mul__
+  .def(self + int()) // __add__ (heterogenous)
+  .def(int() + self) // __radd__
+  ...
+
+

The magic is performed using a simplified application of "expression +templates" [VELD1995], a technique originally developed for +optimization of high-performance matrix algebra expressions. The +essence is that instead of performing the computation immediately, +operators are overloaded to construct a type representing the +computation. In matrix algebra, dramatic optimizations are often +available when the structure of an entire expression can be taken into +account, rather than evaluating each operation "greedily". +Boost.Python uses the same technique to build an appropriate Python +method object based on expressions involving self.

+
+
+

Inheritance

+

C++ inheritance relationships can be represented to Boost.Python by adding +an optional bases<...> argument to the class_<...> template +parameter list as follows:

+
+class_<Derived, bases<Base1,Base2> >("Derived")
+     ...
+
+

This has two effects:

+
    +
  1. When the class_<...> is created, Python type objects +corresponding to Base1 and Base2 are looked up in +Boost.Python's registry, and are used as bases for the new Python +Derived type object, so methods exposed for the Python Base1 +and Base2 types are automatically members of the Derived +type. Because the registry is global, this works correctly even if +Derived is exposed in a different module from either of its +bases.
  2. +
  3. C++ conversions from Derived to its bases are added to the +Boost.Python registry. Thus wrapped C++ methods expecting (a +pointer or reference to) an object of either base type can be +called with an object wrapping a Derived instance. Wrapped +member functions of class T are treated as though they have an +implicit first argument of T&, so these conversions are +neccessary to allow the base class methods to be called for derived +objects.
  4. +
+

Of course it's possible to derive new Python classes from wrapped C++ +class instances. Because Boost.Python uses the new-style class +system, that works very much as for the Python built-in types. There +is one significant detail in which it differs: the built-in types +generally establish their invariants in their __new__ function, so +that derived classes do not need to call __init__ on the base +class before invoking its methods :

+
+>>> class L(list):
+...      def __init__(self):
+...          pass
+...
+>>> L().reverse()
+>>> 
+
+

Because C++ object construction is a one-step operation, C++ instance +data cannot be constructed until the arguments are available, in the +__init__ function:

+
+>>> class D(SomeBoostPythonClass):
+...      def __init__(self):
+...          pass
+...
+>>> D().some_boost_python_method()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+TypeError: bad argument type for built-in operation
+
+

This happened because Boost.Python couldn't find instance data of type +SomeBoostPythonClass within the D instance; D's __init__ +function masked construction of the base class. It could be corrected +by either removing D's __init__ function or having it call +SomeBoostPythonClass.__init__(...) explicitly.

+
+
+

Virtual Functions

+

Deriving new types in Python from extension classes is not very +interesting unless they can be used polymorphically from C++. In +other words, Python method implementations should appear to override +the implementation of C++ virtual functions when called through base +class pointers/references from C++. Since the only way to alter the +behavior of a virtual function is to override it in a derived class, +the user must build a special derived class to dispatch a polymorphic +class' virtual functions:

+
+//
+// interface to wrap:
+//
+class Base
+{
+ public:
+    virtual int f(std::string x) { return 42; }
+    virtual ~Base();
+};
+
+int calls_f(Base const& b, std::string x) { return b.f(x); }
+
+//
+// Wrapping Code
+//
+
+// Dispatcher class
+struct BaseWrap : Base
+{
+    // Store a pointer to the Python object
+    BaseWrap(PyObject* self_) : self(self_) {}
+    PyObject* self;
+
+    // Default implementation, for when f is not overridden
+    int f_default(std::string x) { return this->Base::f(x); }
+    // Dispatch implementation
+    int f(std::string x) { return call_method<int>(self, "f", x); }
+};
+
+...
+    def("calls_f", calls_f);
+    class_<Base, BaseWrap>("Base")
+        .def("f", &Base::f, &BaseWrap::f_default)
+        ;
+
+

Now here's some Python code which demonstrates:

+
+>>> class Derived(Base):
+...     def f(self, s):
+...          return len(s)
+...
+>>> calls_f(Base(), 'foo')
+42
+>>> calls_f(Derived(), 'forty-two')
+9
+
+

Things to notice about the dispatcher class:

+
    +
  • The key element which allows overriding in Python is the +call_method invocation, which uses the same global type +conversion registry as the C++ function wrapping does to convert its +arguments from C++ to Python and its return type from Python to C++.
  • +
  • Any constructor signatures you wish to wrap must be replicated with +an initial PyObject* argument
  • +
  • The dispatcher must store this argument so that it can be used to +invoke call_method
  • +
  • The f_default member function is needed when the function being +exposed is not pure virtual; there's no other way Base::f can be +called on an object of type BaseWrap, since it overrides f.
  • +
+
+
+

Deeper Reflection on the Horizon?

+

Admittedly, this formula is tedious to repeat, especially on a project +with many polymorphic classes. That it is neccessary reflects some +limitations in C++'s compile-time introspection capabilities: there's +no way to enumerate the members of a class and find out which are +virtual functions. At least one very promising project has been +started to write a front-end which can generate these dispatchers (and +other wrapping code) automatically from C++ headers.

+

Pyste is being developed by Bruno da Silva de Oliveira. It builds on +GCC_XML, which generates an XML version of GCC's internal program +representation. Since GCC is a highly-conformant C++ compiler, this +ensures correct handling of the most-sophisticated template code and +full access to the underlying type system. In keeping with the +Boost.Python philosophy, a Pyste interface description is neither +intrusive on the code being wrapped, nor expressed in some unfamiliar +language: instead it is a 100% pure Python script. If Pyste is +successful it will mark a move away from wrapping everything directly +in C++ for many of our users. It will also allow us the choice to +shift some of the metaprogram code from C++ to Python. We expect that +soon, not only our users but the Boost.Python developers themselves +will be "thinking hybrid" about their own code.

+
+
+
+

Serialization

+

Serialization is the process of converting objects in memory to a +form that can be stored on disk or sent over a network connection. The +serialized object (most often a plain string) can be retrieved and +converted back to the original object. A good serialization system will +automatically convert entire object hierarchies. Python's standard +pickle module is just such a system. It leverages the language's strong +runtime introspection facilities for serializing practically arbitrary +user-defined objects. With a few simple and unintrusive provisions this +powerful machinery can be extended to also work for wrapped C++ objects. +Here is an example:

+
+#include <string>
+
+struct World
+{
+    World(std::string a_msg) : msg(a_msg) {}
+    std::string greet() const { return msg; }
+    std::string msg;
+};
+
+#include <boost/python.hpp>
+using namespace boost::python;
+
+struct World_picklers : pickle_suite
+{
+  static tuple
+  getinitargs(World const& w) { return make_tuple(w.greet()); }
+};
+
+BOOST_PYTHON_MODULE(hello)
+{
+    class_<World>("World", init<std::string>())
+        .def("greet", &World::greet)
+        .def_pickle(World_picklers())
+    ;
+}
+
+

Now let's create a World object and put it to rest on disk:

+
+>>> import hello
+>>> import pickle
+>>> a_world = hello.World("howdy")
+>>> pickle.dump(a_world, open("my_world", "w"))
+
+

In a potentially different script on a potentially different +computer with a potentially different operating system:

+
+>>> import pickle
+>>> resurrected_world = pickle.load(open("my_world", "r"))
+>>> resurrected_world.greet()
+'howdy'
+
+

Of course the cPickle module can also be used for faster +processing.

+

Boost.Python's pickle_suite fully supports the pickle protocol +defined in the standard Python documentation. Like a __getinitargs__ +function in Python, the pickle_suite's getinitargs() is responsible for +creating the argument tuple that will be use to reconstruct the pickled +object. The other elements of the Python pickling protocol, +__getstate__ and __setstate__ can be optionally provided via C++ +getstate and setstate functions. C++'s static type system allows the +library to ensure at compile-time that nonsensical combinations of +functions (e.g. getstate without setstate) are not used.

+

Enabling serialization of more complex C++ objects requires a little +more work than is shown in the example above. Fortunately the +object interface (see next section) greatly helps in keeping the +code manageable.

+
+
+

Object interface

+

Experienced 'C' language extension module authors will be familiar +with the ubiquitous PyObject*, manual reference-counting, and the +need to remember which API calls return "new" (owned) references or +"borrowed" (raw) references. These constraints are not just +cumbersome but also a major source of errors, especially in the +presence of exceptions.

+

Boost.Python provides a class object which automates reference +counting and provides conversion to Python from C++ objects of +arbitrary type. This significantly reduces the learning effort for +prospective extension module writers.

+

Creating an object from any other type is extremely simple:

+
+object s("hello, world");  // s manages a Python string
+
+

object has templated interactions with all other types, with +automatic to-python conversions. It happens so naturally that it's +easily overlooked:

+
+object ten_Os = 10 * s[4]; // -> "oooooooooo"
+
+

In the example above, 4 and 10 are converted to Python objects +before the indexing and multiplication operations are invoked.

+

The extract<T> class template can be used to convert Python objects +to C++ types:

+
+double x = extract<double>(o);
+
+

If a conversion in either direction cannot be performed, an +appropriate exception is thrown at runtime.

+

The object type is accompanied by a set of derived types +that mirror the Python built-in types such as list, dict, +tuple, etc. as much as possible. This enables convenient +manipulation of these high-level types from C++:

+
+dict d;
+d["some"] = "thing";
+d["lucky_number"] = 13;
+list l = d.keys();
+
+

This almost looks and works like regular Python code, but it is pure +C++. Of course we can wrap C++ functions which accept or return +object instances.

+
+
+
+

Thinking hybrid

+

Because of the practical and mental difficulties of combining +programming languages, it is common to settle a single language at the +outset of any development effort. For many applications, performance +considerations dictate the use of a compiled language for the core +algorithms. Unfortunately, due to the complexity of the static type +system, the price we pay for runtime performance is often a +significant increase in development time. Experience shows that +writing maintainable C++ code usually takes longer and requires far +more hard-earned working experience than developing comparable Python +code. Even when developers are comfortable working exclusively in +compiled languages, they often augment their systems by some type of +ad hoc scripting layer for the benefit of their users without ever +availing themselves of the same advantages.

+

Boost.Python enables us to think hybrid. Python can be used for +rapidly prototyping a new application; its ease of use and the large +pool of standard libraries give us a head start on the way to a +working system. If necessary, the working code can be used to +discover rate-limiting hotspots. To maximize performance these can +be reimplemented in C++, together with the Boost.Python bindings +needed to tie them back into the existing higher-level procedure.

+

Of course, this top-down approach is less attractive if it is clear +from the start that many algorithms will eventually have to be +implemented in C++. Fortunately Boost.Python also enables us to +pursue a bottom-up approach. We have used this approach very +successfully in the development of a toolbox for scientific +applications. The toolbox started out mainly as a library of C++ +classes with Boost.Python bindings, and for a while the growth was +mainly concentrated on the C++ parts. However, as the toolbox is +becoming more complete, more and more newly added functionality can be +implemented in Python.

+

python_cpp_mix.jpg

+

This figure shows the estimated ratio of newly added C++ and Python +code over time as new algorithms are implemented. We expect this +ratio to level out near 70% Python. Being able to solve new problems +mostly in Python rather than a more difficult statically typed +language is the return on our investment in Boost.Python. The ability +to access all of our code from Python allows a broader group of +developers to use it in the rapid development of new applications.

+
+
+

Development history

+

The first version of Boost.Python was developed in 2000 by Dave +Abrahams at Dragon Systems, where he was privileged to have Tim Peters +as a guide to "The Zen of Python". One of Dave's jobs was to develop +a Python-based natural language processing system. Since it was +eventually going to be targeting embedded hardware, it was always +assumed that the compute-intensive core would be rewritten in C++ to +optimize speed and memory footprint 1. The project also wanted to +test all of its C++ code using Python test scripts 2. The only +tool we knew of for binding C++ and Python was SWIG, and at the time +its handling of C++ was weak. It would be false to claim any deep +insight into the possible advantages of Boost.Python's approach at +this point. Dave's interest and expertise in fancy C++ template +tricks had just reached the point where he could do some real damage, +and Boost.Python emerged as it did because it filled a need and +because it seemed like a cool thing to try.

+

This early version was aimed at many of the same basic goals we've +described in this paper, differing most-noticeably by having a +slightly more cumbersome syntax and by lack of special support for +operator overloading, pickling, and component-based development. +These last three features were quickly added by Ullrich Koethe and +Ralf Grosse-Kunstleve 3, and other enthusiastic contributors arrived +on the scene to contribute enhancements like support for nested +modules and static member functions.

+

By early 2001 development had stabilized and few new features were +being added, however a disturbing new fact came to light: Ralf had +begun testing Boost.Python on pre-release versions of a compiler using +the EDG front-end, and the mechanism at the core of Boost.Python +responsible for handling conversions between Python and C++ types was +failing to compile. As it turned out, we had been exploiting a very +common bug in the implementation of all the C++ compilers we had +tested. We knew that as C++ compilers rapidly became more +standards-compliant, the library would begin failing on more +platforms. Unfortunately, because the mechanism was so central to the +functioning of the library, fixing the problem looked very difficult.

+

Fortunately, later that year Lawrence Berkeley and later Lawrence +Livermore National labs contracted with Boost Consulting for support +and development of Boost.Python, and there was a new opportunity to +address fundamental issues and ensure a future for the library. A +redesign effort began with the low level type conversion architecture, +building in standards-compliance and support for component-based +development (in contrast to version 1 where conversions had to be +explicitly imported and exported across module boundaries). A new +analysis of the relationship between the Python and C++ objects was +done, resulting in more intuitive handling for C++ lvalues and +rvalues.

+

The emergence of a powerful new type system in Python 2.2 made the +choice of whether to maintain compatibility with Python 1.5.2 easy: +the opportunity to throw away a great deal of elaborate code for +emulating classic Python classes alone was too good to pass up. In +addition, Python iterators and descriptors provided crucial and +elegant tools for representing similar C++ constructs. The +development of the generalized object interface allowed us to +further shield C++ programmers from the dangers and syntactic burdens +of the Python 'C' API. A great number of other features including C++ +exception translation, improved support for overloaded functions, and +most significantly, CallPolicies for handling pointers and +references, were added during this period.

+

In October 2002, version 2 of Boost.Python was released. Development +since then has concentrated on improved support for C++ runtime +polymorphism and smart pointers. Peter Dimov's ingenious +boost::shared_ptr design in particular has allowed us to give the +hybrid developer a consistent interface for moving objects back and +forth across the language barrier without loss of information. At +first, we were concerned that the sophistication and complexity of the +Boost.Python v2 implementation might discourage contributors, but the +emergence of Pyste and several other significant feature +contributions have laid those fears to rest. Daily questions on the +Python C++-sig and a backlog of desired improvements show that the +library is getting used. To us, the future looks bright.

+
+
+

Conclusions

+

Boost.Python achieves seamless interoperability between two rich and +complimentary language environments. Because it leverages template +metaprogramming to introspect about types and functions, the user +never has to learn a third syntax: the interface definitions are +written in concise and maintainable C++. Also, the wrapping system +doesn't have to parse C++ headers or represent the type system: the +compiler does that work for us.

+

Computationally intensive tasks play to the strengths of C++ and are +often impossible to implement efficiently in pure Python, while jobs +like serialization that are trivial in Python can be very difficult in +pure C++. Given the luxury of building a hybrid software system from +the ground up, we can approach design with new confidence and power.

+
+
+

Citations

+ + ++ + + +
[VELD1995]T. Veldhuizen, "Expression Templates," C++ Report, +Vol. 7 No. 5 June 1995, pp. 26-31. +http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
+
+
+

Footnotes

+ + + + + +
[1]In retrospect, it seems that "thinking hybrid" from the +ground up might have been better for the NLP system: the +natural component boundaries defined by the pure python +prototype turned out to be inappropriate for getting the +desired performance and memory footprint out of the C++ core, +which eventually caused some redesign overhead on the Python +side when the core was moved to C++.
+ + + + + +
[2]We also have some reservations about driving all C++ +testing through a Python interface, unless that's the only way +it will be ultimately used. Any transition across language +boundaries with such different object models can inevitably +mask bugs.
+ + + + + +
[3]These features were expressed very differently in v1 of +Boost.Python
+
+
+ + + + diff --git a/doc/PyConDC_2003/bpl.pdf b/doc/PyConDC_2003/bpl.pdf new file mode 100755 index 00000000..ffd8903e Binary files /dev/null and b/doc/PyConDC_2003/bpl.pdf differ diff --git a/doc/PyConDC_2003/bpl.txt b/doc/PyConDC_2003/bpl.txt new file mode 100644 index 00000000..c2ee19aa --- /dev/null +++ b/doc/PyConDC_2003/bpl.txt @@ -0,0 +1,947 @@ ++++++++++++++++++++++++++++++++++++++++++++ + Building Hybrid Systems with Boost.Python ++++++++++++++++++++++++++++++++++++++++++++ + +:Author: David Abrahams +:Contact: dave@boost-consulting.com +:organization: `Boost Consulting`_ +:date: $Date$ + +:Author: Ralf W. Grosse-Kunstleve + +:copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved + +.. contents:: Table of Contents + +.. _`Boost Consulting`: http://www.boost-consulting.com + +========== + Abstract +========== + +Boost.Python is an open source C++ library which provides a concise +IDL-like interface for binding C++ classes and functions to +Python. Leveraging the full power of C++ compile-time introspection +and of recently developed metaprogramming techniques, this is achieved +entirely in pure C++, without introducing a new syntax. +Boost.Python's rich set of features and high-level interface make it +possible to engineer packages from the ground up as hybrid systems, +giving programmers easy and coherent access to both the efficient +compile-time polymorphism of C++ and the extremely convenient run-time +polymorphism of Python. + +============== + Introduction +============== + +Python and C++ are in many ways as different as two languages could +be: while C++ is usually compiled to machine-code, Python is +interpreted. Python's dynamic type system is often cited as the +foundation of its flexibility, while in C++ static typing is the +cornerstone of its efficiency. C++ has an intricate and difficult +compile-time meta-language, while in Python, practically everything +happens at runtime. + +Yet for many programmers, these very differences mean that Python and +C++ complement one another perfectly. Performance bottlenecks in +Python programs can be rewritten in C++ for maximal speed, and +authors of powerful C++ libraries choose Python as a middleware +language for its flexible system integration capabilities. +Furthermore, the surface differences mask some strong similarities: + +* 'C'-family control structures (if, while, for...) + +* Support for object-orientation, functional programming, and generic + programming (these are both *multi-paradigm* programming languages.) + +* Comprehensive operator overloading facilities, recognizing the + importance of syntactic variability for readability and + expressivity. + +* High-level concepts such as collections and iterators. + +* High-level encapsulation facilities (C++: namespaces, Python: modules) + to support the design of re-usable libraries. + +* Exception-handling for effective management of error conditions. + +* C++ idioms in common use, such as handle/body classes and + reference-counted smart pointers mirror Python reference semantics. + +Given Python's rich 'C' interoperability API, it should in principle +be possible to expose C++ type and function interfaces to Python with +an analogous interface to their C++ counterparts. However, the +facilities provided by Python alone for integration with C++ are +relatively meager. Compared to C++ and Python, 'C' has only very +rudimentary abstraction facilities, and support for exception-handling +is completely missing. 'C' extension module writers are required to +manually manage Python reference counts, which is both annoyingly +tedious and extremely error-prone. Traditional extension modules also +tend to contain a great deal of boilerplate code repetition which +makes them difficult to maintain, especially when wrapping an evolving +API. + +These limitations have lead to the development of a variety of wrapping +systems. SWIG_ is probably the most popular package for the +integration of C/C++ and Python. A more recent development is SIP_, +which was specifically designed for interfacing Python with the Qt_ +graphical user interface library. Both SWIG and SIP introduce their +own specialized languages for customizing inter-language bindings. +This has certain advantages, but having to deal with three different +languages (Python, C/C++ and the interface language) also introduces +practical and mental difficulties. The CXX_ package demonstrates an +interesting alternative. It shows that at least some parts of +Python's 'C' API can be wrapped and presented through a much more +user-friendly C++ interface. However, unlike SWIG and SIP, CXX does +not include support for wrapping C++ classes as new Python types. + +The features and goals of Boost.Python_ overlap significantly with +many of these other systems. That said, Boost.Python attempts to +maximize convenience and flexibility without introducing a separate +wrapping language. Instead, it presents the user with a high-level +C++ interface for wrapping C++ classes and functions, managing much of +the complexity behind-the-scenes with static metaprogramming. +Boost.Python also goes beyond the scope of earlier systems by +providing: + +* Support for C++ virtual functions that can be overridden in Python. + +* Comprehensive lifetime management facilities for low-level C++ + pointers and references. + +* Support for organizing extensions as Python packages, + with a central registry for inter-language type conversions. + +* A safe and convenient mechanism for tying into Python's powerful + serialization engine (pickle). + +* Coherence with the rules for handling C++ lvalues and rvalues that + can only come from a deep understanding of both the Python and C++ + type systems. + +The key insight that sparked the development of Boost.Python is that +much of the boilerplate code in traditional extension modules could be +eliminated using C++ compile-time introspection. Each argument of a +wrapped C++ function must be extracted from a Python object using a +procedure that depends on the argument type. Similarly the function's +return type determines how the return value will be converted from C++ +to Python. Of course argument and return types are part of each +function's type, and this is exactly the source from which +Boost.Python deduces most of the information required. + +This approach leads to *user guided wrapping*: as much information is +extracted directly from the source code to be wrapped as is possible +within the framework of pure C++, and some additional information is +supplied explicitly by the user. Mostly the guidance is mechanical +and little real intervention is required. Because the interface +specification is written in the same full-featured language as the +code being exposed, the user has unprecedented power available when +she does need to take control. + +.. _Python: http://www.python.org/ +.. _SWIG: http://www.swig.org/ +.. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php +.. _Qt: http://www.trolltech.com/ +.. _CXX: http://cxx.sourceforge.net/ +.. _Boost.Python: http://www.boost.org/libs/python/doc + +=========================== + Boost.Python Design Goals +=========================== + +The primary goal of Boost.Python is to allow users to expose C++ +classes and functions to Python using nothing more than a C++ +compiler. In broad strokes, the user experience should be one of +directly manipulating C++ objects from Python. + +However, it's also important not to translate all interfaces *too* +literally: the idioms of each language must be respected. For +example, though C++ and Python both have an iterator concept, they are +expressed very differently. Boost.Python has to be able to bridge the +interface gap. + +It must be possible to insulate Python users from crashes resulting +from trivial misuses of C++ interfaces, such as accessing +already-deleted objects. By the same token the library should +insulate C++ users from low-level Python 'C' API, replacing +error-prone 'C' interfaces like manual reference-count management and +raw ``PyObject`` pointers with more-robust alternatives. + +Support for component-based development is crucial, so that C++ types +exposed in one extension module can be passed to functions exposed in +another without loss of crucial information like C++ inheritance +relationships. + +Finally, all wrapping must be *non-intrusive*, without modifying or +even seeing the original C++ source code. Existing C++ libraries have +to be wrappable by third parties who only have access to header files +and binaries. + +========================== + Hello Boost.Python World +========================== + +And now for a preview of Boost.Python, and how it improves on the raw +facilities offered by Python. Here's a function we might want to +expose:: + + char const* greet(unsigned x) + { + static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; + + if (x > 2) + throw std::range_error("greet: index out of range"); + + return msgs[x]; + } + +To wrap this function in standard C++ using the Python 'C' API, we'd +need something like this:: + + extern "C" // all Python interactions use 'C' linkage and calling convention + { + // Wrapper to handle argument/result conversion and checking + PyObject* greet_wrap(PyObject* args, PyObject * keywords) + { + int x; + if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments + { + char const* result = greet(x); // invoke wrapped function + return PyString_FromString(result); // convert result to Python + } + return 0; // error occurred + } + + // Table of wrapped functions to be exposed by the module + static PyMethodDef methods[] = { + { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } + , { NULL, NULL, 0, NULL } // sentinel + }; + + // module initialization function + DL_EXPORT init_hello() + { + (void) Py_InitModule("hello", methods); // add the methods to the module + } + } + +Now here's the wrapping code we'd use to expose it with Boost.Python:: + + #include + using namespace boost::python; + BOOST_PYTHON_MODULE(hello) + { + def("greet", greet, "return one of 3 parts of a greeting"); + } + +and here it is in action:: + + >>> import hello + >>> for x in range(3): + ... print hello.greet(x) + ... + hello + Boost.Python + world! + +Aside from the fact that the 'C' API version is much more verbose, +it's worth noting a few things that it doesn't handle correctly: + +* The original function accepts an unsigned integer, and the Python + 'C' API only gives us a way of extracting signed integers. The + Boost.Python version will raise a Python exception if we try to pass + a negative number to ``hello.greet``, but the other one will proceed + to do whatever the C++ implementation does when converting an + negative integer to unsigned (usually wrapping to some very large + number), and pass the incorrect translation on to the wrapped + function. + +* That brings us to the second problem: if the C++ ``greet()`` + function is called with a number greater than 2, it will throw an + exception. Typically, if a C++ exception propagates across the + boundary with code generated by a 'C' compiler, it will cause a + crash. As you can see in the first version, there's no C++ + scaffolding there to prevent this from happening. Functions wrapped + by Boost.Python automatically include an exception-handling layer + which protects Python users by translating unhandled C++ exceptions + into a corresponding Python exception. + +* A slightly more-subtle limitation is that the argument conversion + used in the Python 'C' API case can only get that integer ``x`` in + *one way*. PyArg_ParseTuple can't convert Python ``long`` objects + (arbitrary-precision integers) which happen to fit in an ``unsigned + int`` but not in a ``signed long``, nor will it ever handle a + wrapped C++ class with a user-defined implicit ``operator unsigned + int()`` conversion. Boost.Python's dynamic type conversion + registry allows users to add arbitrary conversion methods. + +================== + Library Overview +================== + +This section outlines some of the library's major features. Except as +neccessary to avoid confusion, details of library implementation are +omitted. + +------------------ + Exposing Classes +------------------ + +C++ classes and structs are exposed with a similarly-terse interface. +Given:: + + struct World + { + void set(std::string msg) { this->msg = msg; } + std::string greet() { return msg; } + std::string msg; + }; + +The following code will expose it in our extension module:: + + #include + BOOST_PYTHON_MODULE(hello) + { + class_("World") + .def("greet", &World::greet) + .def("set", &World::set) + ; + } + +Although this code has a certain pythonic familiarity, people +sometimes find the syntax bit confusing because it doesn't look like +most of the C++ code they're used to. All the same, this is just +standard C++. Because of their flexible syntax and operator +overloading, C++ and Python are great for defining domain-specific +(sub)languages +(DSLs), and that's what we've done in Boost.Python. To break it down:: + + class_("World") + +constructs an unnamed object of type ``class_`` and passes +``"World"`` to its constructor. This creates a new-style Python class +called ``World`` in the extension module, and associates it with the +C++ type ``World`` in the Boost.Python type conversion registry. We +might have also written:: + + class_ w("World"); + +but that would've been more verbose, since we'd have to name ``w`` +again to invoke its ``def()`` member function:: + + w.def("greet", &World::greet) + +There's nothing special about the location of the dot for member +access in the original example: C++ allows any amount of whitespace on +either side of a token, and placing the dot at the beginning of each +line allows us to chain as many successive calls to member functions +as we like with a uniform syntax. The other key fact that allows +chaining is that ``class_<>`` member functions all return a reference +to ``*this``. + +So the example is equivalent to:: + + class_ w("World"); + w.def("greet", &World::greet); + w.def("set", &World::set); + +It's occasionally useful to be able to break down the components of a +Boost.Python class wrapper in this way, but the rest of this article +will stick to the terse syntax. + +For completeness, here's the wrapped class in use: :: + + >>> import hello + >>> planet = hello.World() + >>> planet.set('howdy') + >>> planet.greet() + 'howdy' + +Constructors +============ + +Since our ``World`` class is just a plain ``struct``, it has an +implicit no-argument (nullary) constructor. Boost.Python exposes the +nullary constructor by default, which is why we were able to write: :: + + >>> planet = hello.World() + +However, well-designed classes in any language may require constructor +arguments in order to establish their invariants. Unlike Python, +where ``__init__`` is just a specially-named method, In C++ +constructors cannot be handled like ordinary member functions. In +particular, we can't take their address: ``&World::World`` is an +error. The library provides a different interface for specifying +constructors. Given:: + + struct World + { + World(std::string msg); // added constructor + ... + +we can modify our wrapping code as follows:: + + class_("World", init()) + ... + +of course, a C++ class may have additional constructors, and we can +expose those as well by passing more instances of ``init<...>`` to +``def()``:: + + class_("World", init()) + .def(init()) + ... + +Boost.Python allows wrapped functions, member functions, and +constructors to be overloaded to mirror C++ overloading. + +Data Members and Properties +=========================== + +Any publicly-accessible data members in a C++ class can be easily +exposed as either ``readonly`` or ``readwrite`` attributes:: + + class_("World", init()) + .def_readonly("msg", &World::msg) + ... + +and can be used directly in Python: :: + + >>> planet = hello.World('howdy') + >>> planet.msg + 'howdy' + +This does *not* result in adding attributes to the ``World`` instance +``__dict__``, which can result in substantial memory savings when +wrapping large data structures. In fact, no instance ``__dict__`` +will be created at all unless attributes are explicitly added from +Python. Boost.Python owes this capability to the new Python 2.2 type +system, in particular the descriptor interface and ``property`` type. + +In C++, publicly-accessible data members are considered a sign of poor +design because they break encapsulation, and style guides usually +dictate the use of "getter" and "setter" functions instead. In +Python, however, ``__getattr__``, ``__setattr__``, and since 2.2, +``property`` mean that attribute access is just one more +well-encapsulated syntactic tool at the programmer's disposal. +Boost.Python bridges this idiomatic gap by making Python ``property`` +creation directly available to users. If ``msg`` were private, we +could still expose it as attribute in Python as follows:: + + class_("World", init()) + .add_property("msg", &World::greet, &World::set) + ... + +The example above mirrors the familiar usage of properties in Python +2.2+: :: + + >>> class World(object): + ... __init__(self, msg): + ... self.__msg = msg + ... def greet(self): + ... return self.__msg + ... def set(self, msg): + ... self.__msg = msg + ... msg = property(greet, set) + +Operator Overloading +==================== + +The ability to write arithmetic operators for user-defined types has +been a major factor in the success of both languages for numerical +computation, and the success of packages like NumPy_ attests to the +power of exposing operators in extension modules. Boost.Python +provides a concise mechanism for wrapping operator overloads. The +example below shows a fragment from a wrapper for the Boost rational +number library:: + + class_ >("rational_int") + .def(init()) // constructor, e.g. rational_int(3,4) + .def("numerator", &rational::numerator) + .def("denominator", &rational::denominator) + .def(-self) // __neg__ (unary minus) + .def(self + self) // __add__ (homogeneous) + .def(self * self) // __mul__ + .def(self + int()) // __add__ (heterogenous) + .def(int() + self) // __radd__ + ... + +The magic is performed using a simplified application of "expression +templates" [VELD1995]_, a technique originally developed for +optimization of high-performance matrix algebra expressions. The +essence is that instead of performing the computation immediately, +operators are overloaded to construct a type *representing* the +computation. In matrix algebra, dramatic optimizations are often +available when the structure of an entire expression can be taken into +account, rather than evaluating each operation "greedily". +Boost.Python uses the same technique to build an appropriate Python +method object based on expressions involving ``self``. + +.. _NumPy: http://www.pfdubois.com/numpy/ + +Inheritance +=========== + +C++ inheritance relationships can be represented to Boost.Python by adding +an optional ``bases<...>`` argument to the ``class_<...>`` template +parameter list as follows:: + + class_ >("Derived") + ... + +This has two effects: + +1. When the ``class_<...>`` is created, Python type objects + corresponding to ``Base1`` and ``Base2`` are looked up in + Boost.Python's registry, and are used as bases for the new Python + ``Derived`` type object, so methods exposed for the Python ``Base1`` + and ``Base2`` types are automatically members of the ``Derived`` + type. Because the registry is global, this works correctly even if + ``Derived`` is exposed in a different module from either of its + bases. + +2. C++ conversions from ``Derived`` to its bases are added to the + Boost.Python registry. Thus wrapped C++ methods expecting (a + pointer or reference to) an object of either base type can be + called with an object wrapping a ``Derived`` instance. Wrapped + member functions of class ``T`` are treated as though they have an + implicit first argument of ``T&``, so these conversions are + neccessary to allow the base class methods to be called for derived + objects. + +Of course it's possible to derive new Python classes from wrapped C++ +class instances. Because Boost.Python uses the new-style class +system, that works very much as for the Python built-in types. There +is one significant detail in which it differs: the built-in types +generally establish their invariants in their ``__new__`` function, so +that derived classes do not need to call ``__init__`` on the base +class before invoking its methods : :: + + >>> class L(list): + ... def __init__(self): + ... pass + ... + >>> L().reverse() + >>> + +Because C++ object construction is a one-step operation, C++ instance +data cannot be constructed until the arguments are available, in the +``__init__`` function: :: + + >>> class D(SomeBoostPythonClass): + ... def __init__(self): + ... pass + ... + >>> D().some_boost_python_method() + Traceback (most recent call last): + File "", line 1, in ? + TypeError: bad argument type for built-in operation + +This happened because Boost.Python couldn't find instance data of type +``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__`` +function masked construction of the base class. It could be corrected +by either removing ``D``'s ``__init__`` function or having it call +``SomeBoostPythonClass.__init__(...)`` explicitly. + +Virtual Functions +================= + +Deriving new types in Python from extension classes is not very +interesting unless they can be used polymorphically from C++. In +other words, Python method implementations should appear to override +the implementation of C++ virtual functions when called *through base +class pointers/references from C++*. Since the only way to alter the +behavior of a virtual function is to override it in a derived class, +the user must build a special derived class to dispatch a polymorphic +class' virtual functions:: + + // + // interface to wrap: + // + class Base + { + public: + virtual int f(std::string x) { return 42; } + virtual ~Base(); + }; + + int calls_f(Base const& b, std::string x) { return b.f(x); } + + // + // Wrapping Code + // + + // Dispatcher class + struct BaseWrap : Base + { + // Store a pointer to the Python object + BaseWrap(PyObject* self_) : self(self_) {} + PyObject* self; + + // Default implementation, for when f is not overridden + int f_default(std::string x) { return this->Base::f(x); } + // Dispatch implementation + int f(std::string x) { return call_method(self, "f", x); } + }; + + ... + def("calls_f", calls_f); + class_("Base") + .def("f", &Base::f, &BaseWrap::f_default) + ; + +Now here's some Python code which demonstrates: :: + + >>> class Derived(Base): + ... def f(self, s): + ... return len(s) + ... + >>> calls_f(Base(), 'foo') + 42 + >>> calls_f(Derived(), 'forty-two') + 9 + +Things to notice about the dispatcher class: + +* The key element which allows overriding in Python is the + ``call_method`` invocation, which uses the same global type + conversion registry as the C++ function wrapping does to convert its + arguments from C++ to Python and its return type from Python to C++. + +* Any constructor signatures you wish to wrap must be replicated with + an initial ``PyObject*`` argument + +* The dispatcher must store this argument so that it can be used to + invoke ``call_method`` + +* The ``f_default`` member function is needed when the function being + exposed is not pure virtual; there's no other way ``Base::f`` can be + called on an object of type ``BaseWrap``, since it overrides ``f``. + +Deeper Reflection on the Horizon? +================================= + +Admittedly, this formula is tedious to repeat, especially on a project +with many polymorphic classes. That it is neccessary reflects some +limitations in C++'s compile-time introspection capabilities: there's +no way to enumerate the members of a class and find out which are +virtual functions. At least one very promising project has been +started to write a front-end which can generate these dispatchers (and +other wrapping code) automatically from C++ headers. + +Pyste_ is being developed by Bruno da Silva de Oliveira. It builds on +GCC_XML_, which generates an XML version of GCC's internal program +representation. Since GCC is a highly-conformant C++ compiler, this +ensures correct handling of the most-sophisticated template code and +full access to the underlying type system. In keeping with the +Boost.Python philosophy, a Pyste interface description is neither +intrusive on the code being wrapped, nor expressed in some unfamiliar +language: instead it is a 100% pure Python script. If Pyste is +successful it will mark a move away from wrapping everything directly +in C++ for many of our users. It will also allow us the choice to +shift some of the metaprogram code from C++ to Python. We expect that +soon, not only our users but the Boost.Python developers themselves +will be "thinking hybrid" about their own code. + +.. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html +.. _`Pyste`: http://www.boost.org/libs/python/pyste + +--------------- + Serialization +--------------- + +*Serialization* is the process of converting objects in memory to a +form that can be stored on disk or sent over a network connection. The +serialized object (most often a plain string) can be retrieved and +converted back to the original object. A good serialization system will +automatically convert entire object hierarchies. Python's standard +``pickle`` module is just such a system. It leverages the language's strong +runtime introspection facilities for serializing practically arbitrary +user-defined objects. With a few simple and unintrusive provisions this +powerful machinery can be extended to also work for wrapped C++ objects. +Here is an example:: + + #include + + struct World + { + World(std::string a_msg) : msg(a_msg) {} + std::string greet() const { return msg; } + std::string msg; + }; + + #include + using namespace boost::python; + + struct World_picklers : pickle_suite + { + static tuple + getinitargs(World const& w) { return make_tuple(w.greet()); } + }; + + BOOST_PYTHON_MODULE(hello) + { + class_("World", init()) + .def("greet", &World::greet) + .def_pickle(World_picklers()) + ; + } + +Now let's create a ``World`` object and put it to rest on disk:: + + >>> import hello + >>> import pickle + >>> a_world = hello.World("howdy") + >>> pickle.dump(a_world, open("my_world", "w")) + +In a potentially *different script* on a potentially *different +computer* with a potentially *different operating system*:: + + >>> import pickle + >>> resurrected_world = pickle.load(open("my_world", "r")) + >>> resurrected_world.greet() + 'howdy' + +Of course the ``cPickle`` module can also be used for faster +processing. + +Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol +defined in the standard Python documentation. Like a __getinitargs__ +function in Python, the pickle_suite's getinitargs() is responsible for +creating the argument tuple that will be use to reconstruct the pickled +object. The other elements of the Python pickling protocol, +__getstate__ and __setstate__ can be optionally provided via C++ +getstate and setstate functions. C++'s static type system allows the +library to ensure at compile-time that nonsensical combinations of +functions (e.g. getstate without setstate) are not used. + +Enabling serialization of more complex C++ objects requires a little +more work than is shown in the example above. Fortunately the +``object`` interface (see next section) greatly helps in keeping the +code manageable. + +------------------ + Object interface +------------------ + +Experienced 'C' language extension module authors will be familiar +with the ubiquitous ``PyObject*``, manual reference-counting, and the +need to remember which API calls return "new" (owned) references or +"borrowed" (raw) references. These constraints are not just +cumbersome but also a major source of errors, especially in the +presence of exceptions. + +Boost.Python provides a class ``object`` which automates reference +counting and provides conversion to Python from C++ objects of +arbitrary type. This significantly reduces the learning effort for +prospective extension module writers. + +Creating an ``object`` from any other type is extremely simple:: + + object s("hello, world"); // s manages a Python string + +``object`` has templated interactions with all other types, with +automatic to-python conversions. It happens so naturally that it's +easily overlooked:: + + object ten_Os = 10 * s[4]; // -> "oooooooooo" + +In the example above, ``4`` and ``10`` are converted to Python objects +before the indexing and multiplication operations are invoked. + +The ``extract`` class template can be used to convert Python objects +to C++ types:: + + double x = extract(o); + +If a conversion in either direction cannot be performed, an +appropriate exception is thrown at runtime. + +The ``object`` type is accompanied by a set of derived types +that mirror the Python built-in types such as ``list``, ``dict``, +``tuple``, etc. as much as possible. This enables convenient +manipulation of these high-level types from C++:: + + dict d; + d["some"] = "thing"; + d["lucky_number"] = 13; + list l = d.keys(); + +This almost looks and works like regular Python code, but it is pure +C++. Of course we can wrap C++ functions which accept or return +``object`` instances. + +================= + Thinking hybrid +================= + +Because of the practical and mental difficulties of combining +programming languages, it is common to settle a single language at the +outset of any development effort. For many applications, performance +considerations dictate the use of a compiled language for the core +algorithms. Unfortunately, due to the complexity of the static type +system, the price we pay for runtime performance is often a +significant increase in development time. Experience shows that +writing maintainable C++ code usually takes longer and requires *far* +more hard-earned working experience than developing comparable Python +code. Even when developers are comfortable working exclusively in +compiled languages, they often augment their systems by some type of +ad hoc scripting layer for the benefit of their users without ever +availing themselves of the same advantages. + +Boost.Python enables us to *think hybrid*. Python can be used for +rapidly prototyping a new application; its ease of use and the large +pool of standard libraries give us a head start on the way to a +working system. If necessary, the working code can be used to +discover rate-limiting hotspots. To maximize performance these can +be reimplemented in C++, together with the Boost.Python bindings +needed to tie them back into the existing higher-level procedure. + +Of course, this *top-down* approach is less attractive if it is clear +from the start that many algorithms will eventually have to be +implemented in C++. Fortunately Boost.Python also enables us to +pursue a *bottom-up* approach. We have used this approach very +successfully in the development of a toolbox for scientific +applications. The toolbox started out mainly as a library of C++ +classes with Boost.Python bindings, and for a while the growth was +mainly concentrated on the C++ parts. However, as the toolbox is +becoming more complete, more and more newly added functionality can be +implemented in Python. + +.. image:: python_cpp_mix.jpg + +This figure shows the estimated ratio of newly added C++ and Python +code over time as new algorithms are implemented. We expect this +ratio to level out near 70% Python. Being able to solve new problems +mostly in Python rather than a more difficult statically typed +language is the return on our investment in Boost.Python. The ability +to access all of our code from Python allows a broader group of +developers to use it in the rapid development of new applications. + +===================== + Development history +===================== + +The first version of Boost.Python was developed in 2000 by Dave +Abrahams at Dragon Systems, where he was privileged to have Tim Peters +as a guide to "The Zen of Python". One of Dave's jobs was to develop +a Python-based natural language processing system. Since it was +eventually going to be targeting embedded hardware, it was always +assumed that the compute-intensive core would be rewritten in C++ to +optimize speed and memory footprint [#proto]_. The project also wanted to +test all of its C++ code using Python test scripts [#test]_. The only +tool we knew of for binding C++ and Python was SWIG_, and at the time +its handling of C++ was weak. It would be false to claim any deep +insight into the possible advantages of Boost.Python's approach at +this point. Dave's interest and expertise in fancy C++ template +tricks had just reached the point where he could do some real damage, +and Boost.Python emerged as it did because it filled a need and +because it seemed like a cool thing to try. + +This early version was aimed at many of the same basic goals we've +described in this paper, differing most-noticeably by having a +slightly more cumbersome syntax and by lack of special support for +operator overloading, pickling, and component-based development. +These last three features were quickly added by Ullrich Koethe and +Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived +on the scene to contribute enhancements like support for nested +modules and static member functions. + +By early 2001 development had stabilized and few new features were +being added, however a disturbing new fact came to light: Ralf had +begun testing Boost.Python on pre-release versions of a compiler using +the EDG_ front-end, and the mechanism at the core of Boost.Python +responsible for handling conversions between Python and C++ types was +failing to compile. As it turned out, we had been exploiting a very +common bug in the implementation of all the C++ compilers we had +tested. We knew that as C++ compilers rapidly became more +standards-compliant, the library would begin failing on more +platforms. Unfortunately, because the mechanism was so central to the +functioning of the library, fixing the problem looked very difficult. + +Fortunately, later that year Lawrence Berkeley and later Lawrence +Livermore National labs contracted with `Boost Consulting`_ for support +and development of Boost.Python, and there was a new opportunity to +address fundamental issues and ensure a future for the library. A +redesign effort began with the low level type conversion architecture, +building in standards-compliance and support for component-based +development (in contrast to version 1 where conversions had to be +explicitly imported and exported across module boundaries). A new +analysis of the relationship between the Python and C++ objects was +done, resulting in more intuitive handling for C++ lvalues and +rvalues. + +The emergence of a powerful new type system in Python 2.2 made the +choice of whether to maintain compatibility with Python 1.5.2 easy: +the opportunity to throw away a great deal of elaborate code for +emulating classic Python classes alone was too good to pass up. In +addition, Python iterators and descriptors provided crucial and +elegant tools for representing similar C++ constructs. The +development of the generalized ``object`` interface allowed us to +further shield C++ programmers from the dangers and syntactic burdens +of the Python 'C' API. A great number of other features including C++ +exception translation, improved support for overloaded functions, and +most significantly, CallPolicies for handling pointers and +references, were added during this period. + +In October 2002, version 2 of Boost.Python was released. Development +since then has concentrated on improved support for C++ runtime +polymorphism and smart pointers. Peter Dimov's ingenious +``boost::shared_ptr`` design in particular has allowed us to give the +hybrid developer a consistent interface for moving objects back and +forth across the language barrier without loss of information. At +first, we were concerned that the sophistication and complexity of the +Boost.Python v2 implementation might discourage contributors, but the +emergence of Pyste_ and several other significant feature +contributions have laid those fears to rest. Daily questions on the +Python C++-sig and a backlog of desired improvements show that the +library is getting used. To us, the future looks bright. + +.. _`EDG`: http://www.edg.com + +============= + Conclusions +============= + +Boost.Python achieves seamless interoperability between two rich and +complimentary language environments. Because it leverages template +metaprogramming to introspect about types and functions, the user +never has to learn a third syntax: the interface definitions are +written in concise and maintainable C++. Also, the wrapping system +doesn't have to parse C++ headers or represent the type system: the +compiler does that work for us. + +Computationally intensive tasks play to the strengths of C++ and are +often impossible to implement efficiently in pure Python, while jobs +like serialization that are trivial in Python can be very difficult in +pure C++. Given the luxury of building a hybrid software system from +the ground up, we can approach design with new confidence and power. + +=========== + Citations +=========== + +.. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report, + Vol. 7 No. 5 June 1995, pp. 26-31. + http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html + +=========== + Footnotes +=========== + +.. [#proto] In retrospect, it seems that "thinking hybrid" from the + ground up might have been better for the NLP system: the + natural component boundaries defined by the pure python + prototype turned out to be inappropriate for getting the + desired performance and memory footprint out of the C++ core, + which eventually caused some redesign overhead on the Python + side when the core was moved to C++. + +.. [#test] We also have some reservations about driving all C++ + testing through a Python interface, unless that's the only way + it will be ultimately used. Any transition across language + boundaries with such different object models can inevitably + mask bugs. + +.. [#feature] These features were expressed very differently in v1 of + Boost.Python