2
0
mirror of https://github.com/boostorg/python.git synced 2026-01-22 05:22:45 +00:00

Initial Checkin

[SVN r16855]
This commit is contained in:
Dave Abrahams
2003-01-10 15:11:05 +00:00
parent 50bcf8db34
commit 5895047e23
2 changed files with 836 additions and 0 deletions

648
doc/PyConDC_2003/bpl.txt Normal file
View File

@@ -0,0 +1,648 @@
.. This is a comment. Note how any initial comments are moved by
transforms to after the document title, subtitle, and docinfo.
.. Need intro and conclusion
.. Exposing classes
.. Constructors
.. Overloading
.. Properties and data members
.. Inheritance
.. Operators and Special Functions
.. Virtual Functions
.. Call Policies
++++++++++++++++++++++++++++++++++++++++++++++
Introducing Boost.Python (Extended Abstract)
++++++++++++++++++++++++++++++++++++++++++++++
.. bibliographic fields (which also require a transform):
:Author: David Abrahams
:Address: 45 Walnut Street
Somerville, MA 02143
:Contact: dave@boost-consulting.com
:organization: `Boost Consulting`_
:date: $Date$
:status: This is a "work in progress"
:version: 1
:copyright: Copyright David Abrahams 2002. All rights reserved
:Dedication:
For my girlfriend, wife, and partner Luann
:abstract:
This paper describes the Boost.Python library, a system for
C++/Python interoperability.
.. meta::
:keywords: Boost,python,Boost.Python,C++
:description lang=en: C++/Python interoperability with Boost.Python
.. contents:: Table of Contents
.. section-numbering::
.. _`Boost Consulting`: http://www.boost-consulting.com
==============
Introduction
==============
Python and C++ are in many ways as different as two languages could
be: while C++ is usually compiled to machine-code, Python is
interpreted. Python's dynamic type system is often cited as the
foundation of its flexibility, while in C++ static typing is the
cornerstone of its efficiency. C++ has an intricate and difficult
compile-time meta-language, while in Python, practically everything
happens at runtime.
Yet for many programmers, these very differences mean that Python and
C++ complement one another perfectly. Performance bottlenecks in
Python programs can be rewritten in C++ for maximal speed, and
authors of powerful C++ libraries choose Python as a middleware
language for its flexible system integration capabilities.
Furthermore, the surface differences mask some strong similarities:
* 'C'-family control structures (if, while, for...)
* Support for object-orientation, functional programming, and generic
programming (these are both *multi-paradigm* programming languages.)
* Comprehensive operator overloading facilities, recognizing the
importance of syntactic variability for readability and
expressivity.
* High-level concepts such as collections and iterators.
* Strong support for writers of re-usable libraries.
* C++ idioms in common use, such as handle/body classes and
reference-counted smart pointers mirror Python reference semantics.
* Exception-handling for effective management of error conditions.
Given Python's rich 'C' interoperability API, it should in principle
be possible to expose C++ type and function interfaces to Python with
an analogous interface to their C++ counterparts. However, the
facilities provided by Python alone for integration with C++ are
relatively meager. Some of this, such as the need to manage
reference-counting manually and lack of C++ exception-handling
support, comes from the limitations of the 'C' language in which the
API is implemented. Those issues aside,most of the hard problems and
much of the tedium of exposing C++ in Python extension modules can be
handled if the code understands the C++ type system. For example:
* Every argument of every wrapped function requires some kind of
extraction code to convert it from Python to C++. Likewise, the
function return value has to be converted from C++ to Python.
Appropriate Python exceptions must be raised if the conversion
fails. Argument and return types are part of the function's type,
and much of this tedium can be relieved if the wrapping system can
extract that information through introspection.
* Passing a wrapped C++ derived class instance to a C++ function
accepting a pointer or reference to a base class requires knowledge
of the inheritance relationship and how to translate the address of
a base class into that of a derived class.
The Boost.Python Library (BPL) leverages the power of C++
meta-programming techniques to introspect about the C++ type system,
and presents a simple, IDL-like C++ interface for exposing C++ code in
extension modules.
===========================
Boost.Python Design Goals
===========================
The primary goal of Boost.Python is to allow users to expose C++
classes and functions to Python using nothing more than a C++
compiler. In broad strokes, the user experience should be one of
directly manipulating C++ objects from Python.
However, it's also important not to translate all interfaces *too*
literally: the idioms of each language must be respected. For
example, though C++ and Python both have an iterator concept, they are
expressed very differently. Boost.Python has to be able to bridge the
interface gap.
It must be possible to insulate Python users from crashes resulting
from trivial misuses of C++ interfaces, such as accessing
already-deleted objects. By the same token the library should
insulate C++ users from low-level Python 'C' API, replacing
error-prone 'C' interfaces like manual reference-count management and
raw ``PyObject`` pointers with more-robust alternatives.
Support for component-based development is crucial, so that C++ types
exposed in one extension module can be passed to functions exposed in
another without loss of crucial information like C++ inheritance
relationships.
Finally, all wrapping must be *non-intrusive*, without modifying or
even seeing the original C++ source code. Existing C++ libraries have
to be wrappable by third parties who only have access to header files
and binaries.
==========================
Hello Boost.Python World
==========================
And now for a preview of Boost.Python, and how it improves on the raw
facilities offered by Python. Here's a function we might want to
expose::
char const* greet(unsigned x)
{
static char const* const msgs[] = { "hello", "Boost.Python", "world!" };
if (x > 2)
throw std::range_error("greet: index out of range");
return msgs[x];
}
To wrap this function in standard C++ using the Python 'C' API, we'd
need something like this::
extern "C" // all Python interactions use 'C' linkage and calling convention
{
// Wrapper to handle argument/result conversion and checking
PyObject* greet_wrap(PyObject* args, PyObject * keywords)
{
int x;
if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments
{
char const* result = greet(x); // invoke wrapped function
return PyString_FromString(result); // convert result to Python
}
return 0; // error occurred
}
// Table of wrapped functions to be exposed by the module
static PyMethodDef methods[] = {
{ "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" }
, { NULL, NULL, 0, NULL } // sentinel
};
// module initialization function
DL_EXPORT init_hello()
{
(void) Py_InitModule("hello", methods); // add the methods to the module
}
}
Now here's the wrapping code we'd use to expose it with Boost.Python::
#include <boost/python.hpp>
using namespace boost::python;
BOOST_PYTHON_MODULE(hello)
{
def("greet", greet, "return one of 3 parts of a greeting");
}
and here it is in action::
>>> import hello
>>> for x in range(3):
... print hello.greet(x)
...
hello
Boost.Python
world!
Aside from the fact that the 'C' API version is much more verbose than
the BPL one, it's worth noting that it doesn't handle a few things
correctly:
* The original function accepts an unsigned integer, and the Python
'C' API only gives us a way of extracting signed integers. The
Boost.Python version will raise a Python exception if we try to pass
a negative number to ``hello.greet``, but the other one will proceed
to do whatever the C++ implementation does when converting an
negative integer to unsigned (usually wrapping to some very large
number), and pass the incorrect translation on to the wrapped
function.
* That brings us to the second problem: if the C++ ``greet()``
function is called with a number greater than 2, it will throw an
exception. Typically, if a C++ exception propagates across the
boundary with code generated by a 'C' compiler, it will cause a
crash. As you can see in the first version, there's no C++
scaffolding there to prevent this from happening. Functions wrapped
by Boost.Python automatically include an exception-handling layer
which protects Python users by translating unhandled C++ exceptions
into a corresponding Python exception.
* A slightly more-subtle limitation is that the argument conversion
used in the Python 'C' API case can only get that integer ``x`` in
*one way*. PyArg_ParseTuple can't convert Python ``long`` objects
(arbitrary-precision integers) which happen to fit in an ``unsigned
int`` but not in a ``signed long``, nor will it ever handle a
wrapped C++ class with a user-defined implicit ``operator unsigned
int()`` conversion. The BPL's dynamic type conversion registry
allows users to add arbitrary conversion methods.
==================
Library Overview
==================
This section outlines some of the library's major features. Except as
neccessary to avoid confusion, details of library implementation are
omitted.
------------------
Exposing Classes
------------------
C++ classes and structs are exposed with a similarly-terse interface.
Given::
struct World
{
void set(std::string msg) { this->msg = msg; }
std::string greet() { return msg; }
std::string msg;
};
The following code will expose it in our extension module::
#include <boost/python.hpp>
BOOST_PYTHON_MODULE(hello)
{
class_<World>("World")
.def("greet", &World::greet)
.def("set", &World::set)
;
}
Although this code has a certain pythonic familiarity, people
sometimes find the syntax bit confusing because it doesn't look like
most of the C++ code they're used to. All the same, this is just
standard C++. Because of their flexible syntax and operator
overloading, C++ and Python are great for defining domain-specific
(sub)languages
(DSLs), and that's what we've done in BPL. To break it down::
class_<World>("World")
constructs an unnamed object of type ``class_<World>`` and passes
``"World"`` to its constructor. This creates a new-style Python class
called ``World`` in the extension module, and associates it with the
C++ type ``World`` in the BPL type conversion registry. We might have
also written::
class_<World> w("World");
but that would've been more verbose, since we'd have to name ``w``
again to invoke its ``def()`` member function::
w.def("greet", &World::greet)
There's nothing special about the location of the dot for member
access in the original example: C++ allows any amount of whitespace on
either side of a token, and placing the dot at the beginning of each
line allows us to chain as many successive calls to member functions
as we like with a uniform syntax. The other key fact that allows
chaining is that ``class_<>`` member functions all return a reference
to ``*this``.
So the example is equivalent to::
class_<World> w("World");
w.def("greet", &World::greet);
w.def("set", &World::set);
It's occasionally useful to be able to break down the components of a
Boost.Python class wrapper in this way, but the rest of this paper
will tend to stick to the terse syntax.
For completeness, here's the wrapped class in use:
>>> import hello
>>> planet = hello.World()
>>> planet.set('howdy')
>>> planet.greet()
'howdy'
Constructors
============
Since our ``World`` class is just a plain ``struct``, it has an
implicit no-argument (nullary) constructor. Boost.Python exposes the
nullary constructor by default, which is why we were able to write:
>>> planet = hello.World()
However, well-designed classes in any language may require constructor
arguments in order to establish their invariants. Unlike Python,
where ``__init__`` is just a specially-named method, In C++
constructors cannot be handled like ordinary member functions. In
particular, we can't take their address: ``&World::World`` is an
error. The library provides a different interface for specifying
constructors. Given::
struct World
{
World(std::string msg); // added constructor
...
we can modify our wrapping code as follows::
class_<World>("World", init<std::string>())
...
of course, a C++ class may have additional constructors, and we can
expose those as well by passing more instances of ``init<...>`` to
``def()``::
class_<World>("World", init<std::string>())
.def(init<double, double>())
...
Boost.Python allows wrapped functions, member functions, and
constructors to be overloaded to mirror C++ overloading.
Data Members and Properties
===========================
Any publicly-accessible data members in a C++ class can be easily
exposed as either ``readonly`` or ``readwrite`` attributes::
class_<World>("World", init<std::string>())
.def_readonly("msg", &World::msg)
...
and can be used directly in Python:
>>> planet = hello.World('howdy')
>>> planet.msg
'howdy'
This does *not* result in adding attributes to the ``World`` instance
``__dict__``, which can result in substantial memory savings when
wrapping large data structures. In fact, no instance ``__dict__``
will be created at all unless attributes are explicitly added from
Python. BPL owes this capability to the new Python 2.2 type system,
in particular the descriptor interface and ``property`` type.
In C++, publicly-accessible data members are considered a sign of poor
design because they break encapsulation, and style guides usually
dictate the use of "getter" and "setter" functions instead. In
Python, however, ``__getattr__``, ``__setattr__``, and since 2.2,
``property`` mean that attribute access is just one more
well-encapsulated syntactic tool at the programmer's disposal. BPL
bridges this idiomatic gap by making Python ``property`` creation
directly available to users. So if ``msg`` were private, we could
still expose it as attribute in Python as follows::
class_<World>("World", init<std::string>())
.add_property("msg", &World::greet, &World::set)
...
The example above mirrors the familiar usage of properties in Python
2.2+:
>>> class World(object):
... __init__(self, msg):
... self.__msg = msg
... def greet(self):
... return self.__msg
... def set(self, msg):
... self.__msg = msg
... msg = property(greet, set)
Operators and Special Functions
===============================
The ability to write arithmetic operators for user-defined types that
C++ and Python both allow the definition of has been a major factor in
the popularity of both languages for scientific computing. The
success of packages like NumPy attests to the power of exposing
operators in extension modules. In this example we'll wrap a class
representing a position in a large file::
class FilePos { /*...*/ };
// Linear offset
FilePos operator+(FilePos, int);
FilePos operator+(int, FilePos);
FilePos operator-(FilePos, int);
// Distance between two FilePos objects
int operator-(FilePos, FilePos);
// Offset with assignment
FilePos& operator+=(FilePos&, int);
FilePos& operator-=(FilePos&, int);
// Comparison
bool operator<(FilePos, FilePos);
The wrapping code looks like this::
class_<FilePos>("FilePos")
.def(self + int()) // __add__
.def(int() + self) // __radd__
.def(self - int()) // __sub__
.def(self - self) // __sub__
.def(self += int()) // __iadd__
.def(self -= int()) // __isub__
.def(self < self); // __lt__
;
The magic is performed using a simplified application of "expression
templates" [VELD1995]_, a technique originally developed by for
optimization of high-performance matrix algebra expressions. The
essence is that instead of performing the computation immediately,
operators are overloaded to construct a type *representing* the
computation. In matrix algebra, dramatic optimizations are often
available when the structure of an entire expression can be taken into
account, rather than processing each operation "greedily".
Boost.Python uses the same technique to build an appropriate Python
callable object based on an expression involving ``self``, which is
then added to the class.
Inheritance
===========
C++ inheritance relationships can be represented to Boost.Python by adding
an optional ``bases<...>`` argument to the ``class_<...>`` template
parameter list as follows::
class_<Derived, bases<Base1,Base2> >("Derived")
...
This has two effects:
1. When the ``class_<...>`` is created, Python type objects
corresponding to ``Base1`` and ``Base2`` are looked up in the BPL
registry, and are used as bases for the new Python ``Derived`` type
object [#mi]_, so methods exposed for the Python ``Base1`` and
``Base2`` types are automatically members of the ``Derived`` type.
Because the registry is global, this works correctly even if
``Derived`` is exposed in a different module from either of its
bases.
2. C++ conversions from ``Derived`` to its bases are added to the
Boost.Python registry. Thus wrapped C++ methods expecting (a
pointer or reference to) an object of either base type can be
called with an object wrapping a ``Derived`` instance. Wrapped
member functions of class ``T`` are treated as though they have an
implicit first argument of ``T&``, so these conversions are
neccessary to allow the base class methods to be called for derived
objects.
Of course it's possible to derive new Python classes from wrapped C++
class instances. Because Boost.Python uses the new-style class
system, that works very much as for the Python built-in types. There
is one significant detail in which it differs: the built-in types
generally establish their invariants in their ``__new__`` function, so
that derived classes do not need to call ``__init__`` on the base
class before invoking its methods :
>>> class L(list):
... def __init__(self):
... pass
...
>>> L().reverse()
>>>
Because C++ object construction is a one-step operation, C++ instance
data cannot be constructed until the arguments are available, in the
``__init__`` function:
>>> class D(SomeBPLClass):
... def __init__(self):
... pass
...
>>> D().some_bpl_method()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
This happened because Boost.Python couldn't find instance data of type
``SomeBPLClass`` within the ``D`` instance; ``D``'s ``__init__``
function masked construction of the base class. It could be corrected
by either removing ``D``'s ``__init__`` function or having it call
``SomeBPLClass.__init__(...)`` explicitly.
Virtual Functions
=================
Deriving new types in Python from extension classes is not very
interesting unless they can be used polymorphically from C++. In
other words, Python method implementations should appear to override
the implementation of C++ virtual functions when called *through base
class pointers/references from C++*. Since the only way to alter the
behavior of a virtual function is to override it in a derived class,
the user must build a special derived class to dispatch a polymorphic
class' virtual functions::
//
// interface to wrap:
//
class Base
{
public:
virtual int f(std::string x) { return 42; }
virtual ~Base();
};
int calls_f(Base const& b, std::string x) { return b.f(x); }
//
// Wrapping Code
//
// Dispatcher class
struct BaseWrap : Base
{
// Store a pointer to the Python object
BaseWrap(PyObject* self_) : self(self_) {}
PyObject* self;
// Default implementation, for when f is not overridden
int f_default(std::string x) { return this->Base::f(x); }
// Dispatch implementation
int f(std::string x) { return call_method<int>(self, "f", x); }
};
...
def("calls_f", calls_f);
class_<Base, BaseWrap>("Base")
.def("f", &Base::f, &BaseWrap::f_default)
;
Now here's some Python code which demonstrates:
>>> class Derived(Base):
... def f(self, s):
... return len(s)
...
>>> calls_f(Base(), 'foo')
42
>>> calls_f(Derived(), 'forty-two')
9
Things to notice about the dispatcher class:
* The key element which allows overriding in Python is the
``call_method`` invocation, which uses the same global type
conversion registry as the C++ function wrapping does to convert its
arguments from C++ to Python and its return type from Python to C++.
* Any constructor signatures you wish to wrap must be replicated with
an initial ``PyObject*`` argument
* The dispatcher must store this argument so that it can be used to
invoke ``call_method``
* The ``f_default`` member function is needed when the function being
exposed is not pure virtual; there's no other way ``Base::f`` can be
called on an object of type ``BaseWrap``, since it overrides ``f``.
Admittedly, this formula is tedious to repeat, especially on a project
with many polymorphic classes; that it is neccessary reflects
limitations in C++'s compile-time reflection capabilities. Several
efforts are underway to write front-ends for Boost.Python which can
generate these dispatchers (and other wrapping code) automatically.
If these are successful it will mark a move away from wrapping
everything directly in pure C++ for many of our users.
=============
Conclusions
=============
Perhaps one day we'll have a language with the simplicity and
expressive power of Python and the compile-time muscle of C++. Being
able to take advantage of all of these facilities without paying the
mental and development-time penalties of crossing a language barrier
would bring enormous benefits. Until then, interoperability tools
like Boost.Python can help lower the barrier and make the benefits of
both languages more accessible to both communities.
===========
Footnotes
===========
.. [#mi] For hard-core new-style class/extension module writers it is
worth noting that the normal requirement that all extension classes
with data form a layout-compatible single-inheritance chain is
lifted for Boost.Python extension classes. Clearly, either
``Base1`` or ``Base2`` has to occupy a different offset in the
``Derived`` class instance. This is possible because the wrapped
part of BPL extension class instances is never assumed to have a
fixed offset within the wrapper.
===========
Citations
===========
.. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report,
Vol. 7 No. 5 June 1995, pp. 26-31.
http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html

View File

@@ -0,0 +1,188 @@
/*
:Author: David Goodger
:Contact: goodger@users.sourceforge.net
:date: $Date$
:version: $Revision$
:copyright: This stylesheet has been placed in the public domain.
Default cascading style sheet for the HTML output of Docutils.
*/
.first {
margin-top: 0 }
.last {
margin-bottom: 0 }
a.toc-backref {
text-decoration: none ;
color: black }
dd {
margin-bottom: 0.5em }
div.abstract {
margin: 2em 5em }
div.abstract p.topic-title {
font-weight: bold ;
text-align: center }
div.attention, div.caution, div.danger, div.error, div.hint,
div.important, div.note, div.tip, div.warning {
margin: 2em ;
border: medium outset ;
padding: 1em }
div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title {
color: red ;
font-weight: bold ;
font-family: sans-serif }
div.hint p.admonition-title, div.important p.admonition-title,
div.note p.admonition-title, div.tip p.admonition-title {
font-weight: bold ;
font-family: sans-serif }
div.dedication {
margin: 2em 5em ;
text-align: center ;
font-style: italic }
div.dedication p.topic-title {
font-weight: bold ;
font-style: normal }
div.figure {
margin-left: 2em }
div.footer, div.header {
font-size: smaller }
div.system-messages {
margin: 5em }
div.system-messages h1 {
color: red }
div.system-message {
border: medium outset ;
padding: 1em }
div.system-message p.system-message-title {
color: red ;
font-weight: bold }
div.topic {
margin: 2em }
h1.title {
text-align: center }
h2.subtitle {
text-align: center }
hr {
width: 75% }
ol.simple, ul.simple {
margin-bottom: 1em }
ol.arabic {
list-style: decimal }
ol.loweralpha {
list-style: lower-alpha }
ol.upperalpha {
list-style: upper-alpha }
ol.lowerroman {
list-style: lower-roman }
ol.upperroman {
list-style: upper-roman }
p.caption {
font-style: italic }
p.credits {
font-style: italic ;
font-size: smaller }
p.label {
white-space: nowrap }
p.topic-title {
font-weight: bold }
pre.address {
margin-bottom: 0 ;
margin-top: 0 ;
font-family: serif ;
font-size: 100% }
pre.line-block {
font-family: serif ;
font-size: 100% }
pre.literal-block, pre.doctest-block {
margin-left: 2em ;
margin-right: 2em ;
background-color: #eeeeee }
span.classifier {
font-family: sans-serif ;
font-style: oblique }
span.classifier-delimiter {
font-family: sans-serif ;
font-weight: bold }
span.interpreted {
font-family: sans-serif }
span.option-argument {
font-style: italic }
span.pre {
white-space: pre }
span.problematic {
color: red }
table {
margin-top: 0.5em ;
margin-bottom: 0.5em }
table.citation {
border-left: solid thin gray ;
padding-left: 0.5ex }
table.docinfo {
margin: 2em 4em }
table.footnote {
border-left: solid thin black ;
padding-left: 0.5ex }
td, th {
padding-left: 0.5em ;
padding-right: 0.5em ;
vertical-align: top }
th.docinfo-name, th.field-name {
font-weight: bold ;
text-align: left ;
white-space: nowrap }
h1 tt, h2 tt, h3 tt, h4 tt, h5 tt, h6 tt {
font-size: 100% }
tt {
background-color: #eeeeee }
ul.auto-toc {
list-style-type: none }