diff --git a/doc/PyConDC_2003/bpl.txt b/doc/PyConDC_2003/bpl.txt new file mode 100644 index 00000000..e4db4933 --- /dev/null +++ b/doc/PyConDC_2003/bpl.txt @@ -0,0 +1,648 @@ +.. This is a comment. Note how any initial comments are moved by + transforms to after the document title, subtitle, and docinfo. + +.. Need intro and conclusion +.. Exposing classes + .. Constructors + .. Overloading + .. Properties and data members + .. Inheritance + .. Operators and Special Functions + .. Virtual Functions +.. Call Policies + +++++++++++++++++++++++++++++++++++++++++++++++ + Introducing Boost.Python (Extended Abstract) +++++++++++++++++++++++++++++++++++++++++++++++ + + +.. bibliographic fields (which also require a transform): + +:Author: David Abrahams +:Address: 45 Walnut Street + Somerville, MA 02143 +:Contact: dave@boost-consulting.com +:organization: `Boost Consulting`_ +:date: $Date$ +:status: This is a "work in progress" +:version: 1 +:copyright: Copyright David Abrahams 2002. All rights reserved + +:Dedication: + + For my girlfriend, wife, and partner Luann + +:abstract: + + This paper describes the Boost.Python library, a system for + C++/Python interoperability. + +.. meta:: + :keywords: Boost,python,Boost.Python,C++ + :description lang=en: C++/Python interoperability with Boost.Python + +.. contents:: Table of Contents +.. section-numbering:: + + +.. _`Boost Consulting`: http://www.boost-consulting.com + +============== + Introduction +============== + +Python and C++ are in many ways as different as two languages could +be: while C++ is usually compiled to machine-code, Python is +interpreted. Python's dynamic type system is often cited as the +foundation of its flexibility, while in C++ static typing is the +cornerstone of its efficiency. C++ has an intricate and difficult +compile-time meta-language, while in Python, practically everything +happens at runtime. + +Yet for many programmers, these very differences mean that Python and +C++ complement one another perfectly. Performance bottlenecks in +Python programs can be rewritten in C++ for maximal speed, and +authors of powerful C++ libraries choose Python as a middleware +language for its flexible system integration capabilities. +Furthermore, the surface differences mask some strong similarities: + +* 'C'-family control structures (if, while, for...) + +* Support for object-orientation, functional programming, and generic + programming (these are both *multi-paradigm* programming languages.) + +* Comprehensive operator overloading facilities, recognizing the + importance of syntactic variability for readability and + expressivity. + +* High-level concepts such as collections and iterators. + +* Strong support for writers of re-usable libraries. + +* C++ idioms in common use, such as handle/body classes and + reference-counted smart pointers mirror Python reference semantics. + +* Exception-handling for effective management of error conditions. + +Given Python's rich 'C' interoperability API, it should in principle +be possible to expose C++ type and function interfaces to Python with +an analogous interface to their C++ counterparts. However, the +facilities provided by Python alone for integration with C++ are +relatively meager. Some of this, such as the need to manage +reference-counting manually and lack of C++ exception-handling +support, comes from the limitations of the 'C' language in which the +API is implemented. Those issues aside,most of the hard problems and +much of the tedium of exposing C++ in Python extension modules can be +handled if the code understands the C++ type system. For example: + +* Every argument of every wrapped function requires some kind of + extraction code to convert it from Python to C++. Likewise, the + function return value has to be converted from C++ to Python. + Appropriate Python exceptions must be raised if the conversion + fails. Argument and return types are part of the function's type, + and much of this tedium can be relieved if the wrapping system can + extract that information through introspection. + +* Passing a wrapped C++ derived class instance to a C++ function + accepting a pointer or reference to a base class requires knowledge + of the inheritance relationship and how to translate the address of + a base class into that of a derived class. + +The Boost.Python Library (BPL) leverages the power of C++ +meta-programming techniques to introspect about the C++ type system, +and presents a simple, IDL-like C++ interface for exposing C++ code in +extension modules. + +=========================== + Boost.Python Design Goals +=========================== + +The primary goal of Boost.Python is to allow users to expose C++ +classes and functions to Python using nothing more than a C++ +compiler. In broad strokes, the user experience should be one of +directly manipulating C++ objects from Python. + +However, it's also important not to translate all interfaces *too* +literally: the idioms of each language must be respected. For +example, though C++ and Python both have an iterator concept, they are +expressed very differently. Boost.Python has to be able to bridge the +interface gap. + +It must be possible to insulate Python users from crashes resulting +from trivial misuses of C++ interfaces, such as accessing +already-deleted objects. By the same token the library should +insulate C++ users from low-level Python 'C' API, replacing +error-prone 'C' interfaces like manual reference-count management and +raw ``PyObject`` pointers with more-robust alternatives. + +Support for component-based development is crucial, so that C++ types +exposed in one extension module can be passed to functions exposed in +another without loss of crucial information like C++ inheritance +relationships. + +Finally, all wrapping must be *non-intrusive*, without modifying or +even seeing the original C++ source code. Existing C++ libraries have +to be wrappable by third parties who only have access to header files +and binaries. + +========================== + Hello Boost.Python World +========================== + +And now for a preview of Boost.Python, and how it improves on the raw +facilities offered by Python. Here's a function we might want to +expose:: + + char const* greet(unsigned x) + { + static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; + + if (x > 2) + throw std::range_error("greet: index out of range"); + + return msgs[x]; + } + +To wrap this function in standard C++ using the Python 'C' API, we'd +need something like this:: + + extern "C" // all Python interactions use 'C' linkage and calling convention + { + // Wrapper to handle argument/result conversion and checking + PyObject* greet_wrap(PyObject* args, PyObject * keywords) + { + int x; + if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments + { + char const* result = greet(x); // invoke wrapped function + return PyString_FromString(result); // convert result to Python + } + return 0; // error occurred + } + + // Table of wrapped functions to be exposed by the module + static PyMethodDef methods[] = { + { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } + , { NULL, NULL, 0, NULL } // sentinel + }; + + // module initialization function + DL_EXPORT init_hello() + { + (void) Py_InitModule("hello", methods); // add the methods to the module + } + } + +Now here's the wrapping code we'd use to expose it with Boost.Python:: + + #include + using namespace boost::python; + BOOST_PYTHON_MODULE(hello) + { + def("greet", greet, "return one of 3 parts of a greeting"); + } + +and here it is in action:: + + >>> import hello + >>> for x in range(3): + ... print hello.greet(x) + ... + hello + Boost.Python + world! + +Aside from the fact that the 'C' API version is much more verbose than +the BPL one, it's worth noting that it doesn't handle a few things +correctly: + +* The original function accepts an unsigned integer, and the Python + 'C' API only gives us a way of extracting signed integers. The + Boost.Python version will raise a Python exception if we try to pass + a negative number to ``hello.greet``, but the other one will proceed + to do whatever the C++ implementation does when converting an + negative integer to unsigned (usually wrapping to some very large + number), and pass the incorrect translation on to the wrapped + function. + +* That brings us to the second problem: if the C++ ``greet()`` + function is called with a number greater than 2, it will throw an + exception. Typically, if a C++ exception propagates across the + boundary with code generated by a 'C' compiler, it will cause a + crash. As you can see in the first version, there's no C++ + scaffolding there to prevent this from happening. Functions wrapped + by Boost.Python automatically include an exception-handling layer + which protects Python users by translating unhandled C++ exceptions + into a corresponding Python exception. + +* A slightly more-subtle limitation is that the argument conversion + used in the Python 'C' API case can only get that integer ``x`` in + *one way*. PyArg_ParseTuple can't convert Python ``long`` objects + (arbitrary-precision integers) which happen to fit in an ``unsigned + int`` but not in a ``signed long``, nor will it ever handle a + wrapped C++ class with a user-defined implicit ``operator unsigned + int()`` conversion. The BPL's dynamic type conversion registry + allows users to add arbitrary conversion methods. + +================== + Library Overview +================== + +This section outlines some of the library's major features. Except as +neccessary to avoid confusion, details of library implementation are +omitted. + +------------------ + Exposing Classes +------------------ + +C++ classes and structs are exposed with a similarly-terse interface. +Given:: + + struct World + { + void set(std::string msg) { this->msg = msg; } + std::string greet() { return msg; } + std::string msg; + }; + +The following code will expose it in our extension module:: + + #include + BOOST_PYTHON_MODULE(hello) + { + class_("World") + .def("greet", &World::greet) + .def("set", &World::set) + ; + } + +Although this code has a certain pythonic familiarity, people +sometimes find the syntax bit confusing because it doesn't look like +most of the C++ code they're used to. All the same, this is just +standard C++. Because of their flexible syntax and operator +overloading, C++ and Python are great for defining domain-specific +(sub)languages +(DSLs), and that's what we've done in BPL. To break it down:: + + class_("World") + +constructs an unnamed object of type ``class_`` and passes +``"World"`` to its constructor. This creates a new-style Python class +called ``World`` in the extension module, and associates it with the +C++ type ``World`` in the BPL type conversion registry. We might have +also written:: + + class_ w("World"); + +but that would've been more verbose, since we'd have to name ``w`` +again to invoke its ``def()`` member function:: + + w.def("greet", &World::greet) + +There's nothing special about the location of the dot for member +access in the original example: C++ allows any amount of whitespace on +either side of a token, and placing the dot at the beginning of each +line allows us to chain as many successive calls to member functions +as we like with a uniform syntax. The other key fact that allows +chaining is that ``class_<>`` member functions all return a reference +to ``*this``. + +So the example is equivalent to:: + + class_ w("World"); + w.def("greet", &World::greet); + w.def("set", &World::set); + +It's occasionally useful to be able to break down the components of a +Boost.Python class wrapper in this way, but the rest of this paper +will tend to stick to the terse syntax. + +For completeness, here's the wrapped class in use: + +>>> import hello +>>> planet = hello.World() +>>> planet.set('howdy') +>>> planet.greet() +'howdy' + +Constructors +============ + +Since our ``World`` class is just a plain ``struct``, it has an +implicit no-argument (nullary) constructor. Boost.Python exposes the +nullary constructor by default, which is why we were able to write: + +>>> planet = hello.World() + +However, well-designed classes in any language may require constructor +arguments in order to establish their invariants. Unlike Python, +where ``__init__`` is just a specially-named method, In C++ +constructors cannot be handled like ordinary member functions. In +particular, we can't take their address: ``&World::World`` is an +error. The library provides a different interface for specifying +constructors. Given:: + + struct World + { + World(std::string msg); // added constructor + ... + +we can modify our wrapping code as follows:: + + class_("World", init()) + ... + +of course, a C++ class may have additional constructors, and we can +expose those as well by passing more instances of ``init<...>`` to +``def()``:: + + class_("World", init()) + .def(init()) + ... + +Boost.Python allows wrapped functions, member functions, and +constructors to be overloaded to mirror C++ overloading. + +Data Members and Properties +=========================== + +Any publicly-accessible data members in a C++ class can be easily +exposed as either ``readonly`` or ``readwrite`` attributes:: + + class_("World", init()) + .def_readonly("msg", &World::msg) + ... + +and can be used directly in Python: + +>>> planet = hello.World('howdy') +>>> planet.msg +'howdy' + +This does *not* result in adding attributes to the ``World`` instance +``__dict__``, which can result in substantial memory savings when +wrapping large data structures. In fact, no instance ``__dict__`` +will be created at all unless attributes are explicitly added from +Python. BPL owes this capability to the new Python 2.2 type system, +in particular the descriptor interface and ``property`` type. + +In C++, publicly-accessible data members are considered a sign of poor +design because they break encapsulation, and style guides usually +dictate the use of "getter" and "setter" functions instead. In +Python, however, ``__getattr__``, ``__setattr__``, and since 2.2, +``property`` mean that attribute access is just one more +well-encapsulated syntactic tool at the programmer's disposal. BPL +bridges this idiomatic gap by making Python ``property`` creation +directly available to users. So if ``msg`` were private, we could +still expose it as attribute in Python as follows:: + + class_("World", init()) + .add_property("msg", &World::greet, &World::set) + ... + +The example above mirrors the familiar usage of properties in Python +2.2+: + +>>> class World(object): +... __init__(self, msg): +... self.__msg = msg +... def greet(self): +... return self.__msg +... def set(self, msg): +... self.__msg = msg +... msg = property(greet, set) + +Operators and Special Functions +=============================== + +The ability to write arithmetic operators for user-defined types that +C++ and Python both allow the definition of has been a major factor in +the popularity of both languages for scientific computing. The +success of packages like NumPy attests to the power of exposing +operators in extension modules. In this example we'll wrap a class +representing a position in a large file:: + + class FilePos { /*...*/ }; + + // Linear offset + FilePos operator+(FilePos, int); + FilePos operator+(int, FilePos); + FilePos operator-(FilePos, int); + + // Distance between two FilePos objects + int operator-(FilePos, FilePos); + + // Offset with assignment + FilePos& operator+=(FilePos&, int); + FilePos& operator-=(FilePos&, int); + + // Comparison + bool operator<(FilePos, FilePos); + +The wrapping code looks like this:: + + class_("FilePos") + .def(self + int()) // __add__ + .def(int() + self) // __radd__ + .def(self - int()) // __sub__ + + .def(self - self) // __sub__ + + .def(self += int()) // __iadd__ + .def(self -= int()) // __isub__ + + .def(self < self); // __lt__ + ; + +The magic is performed using a simplified application of "expression +templates" [VELD1995]_, a technique originally developed by for +optimization of high-performance matrix algebra expressions. The +essence is that instead of performing the computation immediately, +operators are overloaded to construct a type *representing* the +computation. In matrix algebra, dramatic optimizations are often +available when the structure of an entire expression can be taken into +account, rather than processing each operation "greedily". +Boost.Python uses the same technique to build an appropriate Python +callable object based on an expression involving ``self``, which is +then added to the class. + +Inheritance +=========== + +C++ inheritance relationships can be represented to Boost.Python by adding +an optional ``bases<...>`` argument to the ``class_<...>`` template +parameter list as follows:: + + class_ >("Derived") + ... + +This has two effects: + +1. When the ``class_<...>`` is created, Python type objects + corresponding to ``Base1`` and ``Base2`` are looked up in the BPL + registry, and are used as bases for the new Python ``Derived`` type + object [#mi]_, so methods exposed for the Python ``Base1`` and + ``Base2`` types are automatically members of the ``Derived`` type. + Because the registry is global, this works correctly even if + ``Derived`` is exposed in a different module from either of its + bases. + +2. C++ conversions from ``Derived`` to its bases are added to the + Boost.Python registry. Thus wrapped C++ methods expecting (a + pointer or reference to) an object of either base type can be + called with an object wrapping a ``Derived`` instance. Wrapped + member functions of class ``T`` are treated as though they have an + implicit first argument of ``T&``, so these conversions are + neccessary to allow the base class methods to be called for derived + objects. + +Of course it's possible to derive new Python classes from wrapped C++ +class instances. Because Boost.Python uses the new-style class +system, that works very much as for the Python built-in types. There +is one significant detail in which it differs: the built-in types +generally establish their invariants in their ``__new__`` function, so +that derived classes do not need to call ``__init__`` on the base +class before invoking its methods : + +>>> class L(list): +... def __init__(self): +... pass +... +>>> L().reverse() +>>> + +Because C++ object construction is a one-step operation, C++ instance +data cannot be constructed until the arguments are available, in the +``__init__`` function: + +>>> class D(SomeBPLClass): +... def __init__(self): +... pass +... +>>> D().some_bpl_method() +Traceback (most recent call last): + File "", line 1, in ? +TypeError: bad argument type for built-in operation + +This happened because Boost.Python couldn't find instance data of type +``SomeBPLClass`` within the ``D`` instance; ``D``'s ``__init__`` +function masked construction of the base class. It could be corrected +by either removing ``D``'s ``__init__`` function or having it call +``SomeBPLClass.__init__(...)`` explicitly. + +Virtual Functions +================= + +Deriving new types in Python from extension classes is not very +interesting unless they can be used polymorphically from C++. In +other words, Python method implementations should appear to override +the implementation of C++ virtual functions when called *through base +class pointers/references from C++*. Since the only way to alter the +behavior of a virtual function is to override it in a derived class, +the user must build a special derived class to dispatch a polymorphic +class' virtual functions:: + + // + // interface to wrap: + // + class Base + { + public: + virtual int f(std::string x) { return 42; } + virtual ~Base(); + }; + + int calls_f(Base const& b, std::string x) { return b.f(x); } + + // + // Wrapping Code + // + + // Dispatcher class + struct BaseWrap : Base + { + // Store a pointer to the Python object + BaseWrap(PyObject* self_) : self(self_) {} + PyObject* self; + + // Default implementation, for when f is not overridden + int f_default(std::string x) { return this->Base::f(x); } + // Dispatch implementation + int f(std::string x) { return call_method(self, "f", x); } + }; + + ... + def("calls_f", calls_f); + class_("Base") + .def("f", &Base::f, &BaseWrap::f_default) + ; + +Now here's some Python code which demonstrates: + +>>> class Derived(Base): +... def f(self, s): +... return len(s) +... +>>> calls_f(Base(), 'foo') +42 +>>> calls_f(Derived(), 'forty-two') +9 + +Things to notice about the dispatcher class: + +* The key element which allows overriding in Python is the + ``call_method`` invocation, which uses the same global type + conversion registry as the C++ function wrapping does to convert its + arguments from C++ to Python and its return type from Python to C++. + +* Any constructor signatures you wish to wrap must be replicated with + an initial ``PyObject*`` argument + +* The dispatcher must store this argument so that it can be used to + invoke ``call_method`` + +* The ``f_default`` member function is needed when the function being + exposed is not pure virtual; there's no other way ``Base::f`` can be + called on an object of type ``BaseWrap``, since it overrides ``f``. + +Admittedly, this formula is tedious to repeat, especially on a project +with many polymorphic classes; that it is neccessary reflects +limitations in C++'s compile-time reflection capabilities. Several +efforts are underway to write front-ends for Boost.Python which can +generate these dispatchers (and other wrapping code) automatically. +If these are successful it will mark a move away from wrapping +everything directly in pure C++ for many of our users. + +============= + Conclusions +============= + +Perhaps one day we'll have a language with the simplicity and +expressive power of Python and the compile-time muscle of C++. Being +able to take advantage of all of these facilities without paying the +mental and development-time penalties of crossing a language barrier +would bring enormous benefits. Until then, interoperability tools +like Boost.Python can help lower the barrier and make the benefits of +both languages more accessible to both communities. + +=========== + Footnotes +=========== + +.. [#mi] For hard-core new-style class/extension module writers it is + worth noting that the normal requirement that all extension classes + with data form a layout-compatible single-inheritance chain is + lifted for Boost.Python extension classes. Clearly, either + ``Base1`` or ``Base2`` has to occupy a different offset in the + ``Derived`` class instance. This is possible because the wrapped + part of BPL extension class instances is never assumed to have a + fixed offset within the wrapper. + +=========== + Citations +=========== + +.. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report, + Vol. 7 No. 5 June 1995, pp. 26-31. + http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html diff --git a/doc/PyConDC_2003/default.css b/doc/PyConDC_2003/default.css new file mode 100644 index 00000000..2e1fddb9 --- /dev/null +++ b/doc/PyConDC_2003/default.css @@ -0,0 +1,188 @@ +/* +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:date: $Date$ +:version: $Revision$ +:copyright: This stylesheet has been placed in the public domain. + +Default cascading style sheet for the HTML output of Docutils. +*/ + +.first { + margin-top: 0 } + +.last { + margin-bottom: 0 } + +a.toc-backref { + text-decoration: none ; + color: black } + +dd { + margin-bottom: 0.5em } + +div.abstract { + margin: 2em 5em } + +div.abstract p.topic-title { + font-weight: bold ; + text-align: center } + +div.attention, div.caution, div.danger, div.error, div.hint, +div.important, div.note, div.tip, div.warning { + margin: 2em ; + border: medium outset ; + padding: 1em } + +div.attention p.admonition-title, div.caution p.admonition-title, +div.danger p.admonition-title, div.error p.admonition-title, +div.warning p.admonition-title { + color: red ; + font-weight: bold ; + font-family: sans-serif } + +div.hint p.admonition-title, div.important p.admonition-title, +div.note p.admonition-title, div.tip p.admonition-title { + font-weight: bold ; + font-family: sans-serif } + +div.dedication { + margin: 2em 5em ; + text-align: center ; + font-style: italic } + +div.dedication p.topic-title { + font-weight: bold ; + font-style: normal } + +div.figure { + margin-left: 2em } + +div.footer, div.header { + font-size: smaller } + +div.system-messages { + margin: 5em } + +div.system-messages h1 { + color: red } + +div.system-message { + border: medium outset ; + padding: 1em } + +div.system-message p.system-message-title { + color: red ; + font-weight: bold } + +div.topic { + margin: 2em } + +h1.title { + text-align: center } + +h2.subtitle { + text-align: center } + +hr { + width: 75% } + +ol.simple, ul.simple { + margin-bottom: 1em } + +ol.arabic { + list-style: decimal } + +ol.loweralpha { + list-style: lower-alpha } + +ol.upperalpha { + list-style: upper-alpha } + +ol.lowerroman { + list-style: lower-roman } + +ol.upperroman { + list-style: upper-roman } + +p.caption { + font-style: italic } + +p.credits { + font-style: italic ; + font-size: smaller } + +p.label { + white-space: nowrap } + +p.topic-title { + font-weight: bold } + +pre.address { + margin-bottom: 0 ; + margin-top: 0 ; + font-family: serif ; + font-size: 100% } + +pre.line-block { + font-family: serif ; + font-size: 100% } + +pre.literal-block, pre.doctest-block { + margin-left: 2em ; + margin-right: 2em ; + background-color: #eeeeee } + +span.classifier { + font-family: sans-serif ; + font-style: oblique } + +span.classifier-delimiter { + font-family: sans-serif ; + font-weight: bold } + +span.interpreted { + font-family: sans-serif } + +span.option-argument { + font-style: italic } + +span.pre { + white-space: pre } + +span.problematic { + color: red } + +table { + margin-top: 0.5em ; + margin-bottom: 0.5em } + +table.citation { + border-left: solid thin gray ; + padding-left: 0.5ex } + +table.docinfo { + margin: 2em 4em } + +table.footnote { + border-left: solid thin black ; + padding-left: 0.5ex } + +td, th { + padding-left: 0.5em ; + padding-right: 0.5em ; + vertical-align: top } + +th.docinfo-name, th.field-name { + font-weight: bold ; + text-align: left ; + white-space: nowrap } + +h1 tt, h2 tt, h3 tt, h4 tt, h5 tt, h6 tt { + font-size: 100% } + +tt { + background-color: #eeeeee } + +ul.auto-toc { + list-style-type: none }