mirror of
https://github.com/boostorg/statechart.git
synced 2026-01-26 07:02:11 +00:00
768 lines
43 KiB
HTML
768 lines
43 KiB
HTML
<html>
|
||
|
||
<head>
|
||
<meta http-equiv="Content-Language" content="en-us">
|
||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
|
||
<meta name="ProgId" content="FrontPage.Editor.Document">
|
||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||
<title>The boost::fsm library - Rationale</title>
|
||
</head>
|
||
|
||
<body link="#0000ff" vlink="#800080">
|
||
|
||
<table border="0" cellpadding="7" cellspacing="0" width="100%" summary="header">
|
||
<tr>
|
||
<td valign="top" width="300">
|
||
<h3><a href="../../../index.htm">
|
||
<img alt="C++ Boost" src="../../../boost.png" border="0" width="277" height="86"></a></h3>
|
||
</td>
|
||
<td valign="top">
|
||
<h1 align="center">The boost::fsm library</h1>
|
||
<h2 align="center">Rationale</h2>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
<hr>
|
||
<dl class="index">
|
||
<dt><a href="#Introduction">Introduction</a></dt>
|
||
<dt><a href="#Why yet another state machine framework">Why yet another state
|
||
machine framework</a></dt>
|
||
<dt><a href="#State-local storage">State-local storage</a></dt>
|
||
<dt><a href="#Dynamic configurability">Dynamic configurability</a></dt>
|
||
<dt><a href="#Error handling">Error handling</a></dt>
|
||
<dt><a href="#Asynchronous state machines">Asynchronous state machines</a></dt>
|
||
<dt><a href="#User actions: Member functions vs. function objects">User
|
||
actions: Member functions vs. function objects</a></dt>
|
||
<dt><a href="#Speed versus scalability tradeoffs">Speed versus scalability
|
||
tradeoffs</a></dt>
|
||
<dt><a href="#Memory management customization">Memory management
|
||
customization</a></dt>
|
||
<dt><a href="#RTTI customization">RTTI customization</a></dt>
|
||
<dt><a href="#Double dispatch">Double dispatch</a></dt>
|
||
<dt><a href="#Resource usage">Resource usage</a></dt>
|
||
<dt><a href="#Limitations">Limitations</a></dt>
|
||
</dl>
|
||
<h2><a name="Introduction">Introduction</a></h2>
|
||
<p>Most of the design decisions made during the development of this library
|
||
are the result of the following requirements.</p>
|
||
<p>boost::fsm should ...</p>
|
||
<ol>
|
||
<li>be fully type-safe. Whenever possible, type mismatches should be flagged
|
||
with an error at compile-time</li>
|
||
<li>not require the use of a code generator. A lot of the existing FSM
|
||
solutions force the developer to design the state machine either graphically
|
||
or in a specialized language. All or part of the code is then generated</li>
|
||
<li>allow for easy transformation of a UML statechart (defined in
|
||
<a href="http://www.omg.org/cgi-bin/doc?formal/03-03-01">
|
||
http://www.omg.org/cgi-bin/doc?formal/03-03-01</a>) into a working state
|
||
machine. Vice versa, an existing C++ implementation of a state machine
|
||
should be fairly trivial to transform into a UML statechart. Specifically,
|
||
the following state machine features should be supported:
|
||
<ul>
|
||
<li>Hierarchical (composite, nested) states</li>
|
||
<li>Orthogonal (concurrent) states</li>
|
||
<li>Entry-, exit- and transition-actions</li>
|
||
<li>Guards</li>
|
||
<li>Shallow/deep history</li>
|
||
</ul>
|
||
</li>
|
||
<li>produce a customizable reaction when a C++ exception is propagated from
|
||
user code</li>
|
||
<li>support synchronous and asynchronous state machines and leave it to the
|
||
user which thread an asynchronous state machine will run in. Users should
|
||
also be able to use the threading library of their choice</li>
|
||
<li>support the development of arbitrarily large and complex state machines.
|
||
Multiple developers should be able to work on the same state machine
|
||
simultaneously</li>
|
||
<li>allow the user to customize all resource management so that the library
|
||
could be used for applications with hard real-time requirements</li>
|
||
<li>enforce as much as possible at compile time. Specifically, invalid state
|
||
machines should not compile</li>
|
||
<li>offer reasonable performance for a wide range of applications</li>
|
||
</ol>
|
||
<h2><a name="Why yet another state machine framework">Why yet another state
|
||
machine framework?</a></h2>
|
||
<p>Before I started to develop this library I had a look at the following
|
||
frameworks:</p>
|
||
<ul>
|
||
<li>The framework accompanying the book "Practical Statecharts in C/C++" by
|
||
Miro Samek, CMP Books, ISBN: 1-57820-110-1<br>
|
||
<a href="http://www.quantum-leaps.com">http://www.quantum-leaps.com<br>
|
||
</a>Fails to satisfy at least the requirements 1, 3, 4, 6, 8.</li>
|
||
<li>The framework accompanying "Rhapsody in C++" by ILogix (a code generator
|
||
solution)<br>
|
||
<a href="http://www.ilogix.com/products/rhapsody/rhap_incplus.cfm">
|
||
http://www.ilogix.com/products/rhapsody/rhap_incplus.cfm<br>
|
||
</a>This might look like comparing apples with oranges. However, there is no
|
||
inherent reason why a code generator couldn't produce code that can easily
|
||
be understood and modified by humans. Fails to satisfy at least the
|
||
requirements 2, 4, 5, 6, 8 (there is quite a bit of error checking before
|
||
code generation, though).</li>
|
||
<li>The framework accompanying the article "State Machine Design in C++"<br>
|
||
<a href="http://www.cuj.com/articles/2000/0005/0005f/0005f.htm?topic=articles">
|
||
http://www.cuj.com/articles/2000/0005/0005f/0005f.htm?topic=articles<br>
|
||
</a>Fails to satisfy at least the requirements 1, 3, 4, 5 (there is no
|
||
direct threading support), 6, 8.</li>
|
||
</ul>
|
||
<p>I believe boost::fsm satisfies all requirements.</p>
|
||
<h2><a name="State-local storage">State-local storage</a></h2>
|
||
<p>This not yet widely known state machine feature is enabled by the fact that
|
||
every state is represented by a class. Upon state-entry, an object of the
|
||
class is constructed and the object is later destructed when the state machine
|
||
exits the state. Any data that is useful only as long as the machine resides
|
||
in the state can (and should) thus be a member of the state. This feature
|
||
paired with the ability to spread a state machine over several translation
|
||
units makes possible virtually unlimited scalability. </p>
|
||
<p>In most existing FSM frameworks the whole state machine runs in one
|
||
environment (context). That is, all resource handles and variables local to
|
||
the state machine are stored in one place (normally as members of the class
|
||
that also derives from some state machine base class). For large state
|
||
machines this often leads to the class having a huge number of data members
|
||
most of which are needed only briefly in a tiny part of the machine. The state
|
||
machine class therefore often becomes a change hotspot what leads to frequent
|
||
recompilations of the whole state machine. </p>
|
||
<h2><a name="Dynamic configurability">Dynamic configurability</a></h2>
|
||
<h3>Two types of state machine frameworks</h3>
|
||
<ul>
|
||
<li>A state machine framework supports dynamic configurability if the whole
|
||
layout of a state machine can be defined at runtime ("layout" refers to
|
||
states and transitions, actions are still specified with normal C++ code).
|
||
That is, data only available at runtime can be used to build arbitrarily
|
||
large machines. See "A Multiple Substring Search Algorithm" by Moishe
|
||
Halibard and Moshe Rubin in June 2002 issue of CUJ for a good example
|
||
(unfortunately not available online).</li>
|
||
<li>On the other side are state machine frameworks which require the layout
|
||
to be specified at compile time.</li>
|
||
</ul>
|
||
<p>State machines that are built at runtime almost always get away with a
|
||
simple state model (no hierarchical states, no orthogonal states, no entry and
|
||
exit actions, no history) because the layout is very often <b>computed by an
|
||
algorithm</b>. On the other hand, machine layouts that are fixed at compile
|
||
time are almost always designed by humans, who frequently need/want a
|
||
sophisticated state model in order to keep the complexity at acceptable
|
||
levels. Dynamically configurable FSM frameworks are therefore often optimized
|
||
for simple flat machines while incarnations of the static variant tend to
|
||
offer more features for abstraction.</p>
|
||
<p>However, fully-featured dynamic FSM libraries do exist. So, the question
|
||
is:</p>
|
||
<h3>Why not use a dynamically configurable FSM library for all state machines?</h3>
|
||
<p>One might argue that a dynamically configurable FSM framework is all one
|
||
ever needs because <b>any</b> state machine can be implemented with it.
|
||
However, due to its nature such a framework has a number of disadvantages when
|
||
used to implement static machines:</p>
|
||
<ul>
|
||
<li>No compile-time optimizations and validations can be made. For example,
|
||
boost::fsm determines the
|
||
<a href="definitions.html#Innermost common context">innermost common
|
||
context</a> of the transition-source and destination state at compile
|
||
time. Moreover, compile time checks ensure that the state machine is valid
|
||
(e.g. that there are no transitions between orthogonal states).</li>
|
||
<li>Double dispatch must inevitably be implemented with some kind of a
|
||
table. As argued under <a href="#Double dispatch">Double dispatch</a>, this
|
||
scales badly.</li>
|
||
<li>To warrant fast table lookup, states and events must be represented with
|
||
an integer. To keep the table as small as possible, the numbering should be
|
||
continuous, e.g. if there are ten states, it's best to use the ids 0-9. To
|
||
ensure continuity of ids, all states are best defined in the same header
|
||
file. The same applies to events. Again, this does not scale.</li>
|
||
<li>Because events carrying parameters are not represented by a type, some
|
||
sort of a generic event with a property map must be used and type-safety is
|
||
enforced at runtime rather than at compile time.</li>
|
||
</ul>
|
||
<p>It is for these reasons, that boost::fsm was built from ground up to <b>not</b>
|
||
support dynamic configurability. However, this does not mean that it's
|
||
impossible to dynamically shape a machine implemented with this library. For
|
||
example, guards can be used to make different transitions depending on input
|
||
only available at runtime. However, such layout changes will always be limited
|
||
to what can be foreseen before compilation. A somewhat related library, the
|
||
boost::spirit parser framework, allows for roughly the same runtime
|
||
configurability. </p>
|
||
<h2><a name="Error handling">Error handling</a></h2>
|
||
<p>There is not a single word about error handling in the UML state machine
|
||
semantics specifications. Moreover, most existing FSM solutions also seem to
|
||
ignore the issue. </p>
|
||
<h3>Why an FSM library should support error handling</h3>
|
||
<p>Consider the following state configuration:</p>
|
||
<p><img border="0" src="A.gif" width="230" height="170"></p>
|
||
<p>Both states define entry actions (x() and y()). Whenever state A becomes
|
||
active, a call to x() will immediately be followed by a call to y(). y() could
|
||
depend on the side-effects of x(). Therefore, executing y() does not make
|
||
sense if x() fails. This is not an esoteric corner case but happens in
|
||
every-day state machines all the time. For example, x() could acquire memory
|
||
the contents of which is later modified by y(). There is a different but in
|
||
terms of error handling equally critical situation in the Tutorial under
|
||
<a href="tutorial.html#Getting state information out of the machine">Getting
|
||
state information out of the machine</a> when <code>Running::~Running()</code>
|
||
accesses its outer state <code>Active</code>. Had the entry action of <code>
|
||
Active</code> failed and had <code>Running</code> been entered anyway then
|
||
<code>Running</code>'s exit action would have invoked undefined behavior.
|
||
The error handling situation with outer and inner states resembles the one
|
||
with base and derived classes: If a base class constructor fails (by throwing
|
||
an exception) the construction is aborted, the derived class constructor is
|
||
not called and the object never comes to life.<br>
|
||
In most traditional FSM frameworks such an error situation is relatively easy to
|
||
tackle <b>as
|
||
long as the error can be propagated to the state machine client</b>. In this case
|
||
a failed action simply propagates a C++ exception into the framework. The framework
|
||
usually does not catch the exception so that the state machine client can handle
|
||
it. Note that, after doing so, the client can no longer use
|
||
the state machine object because it is either in an unknown state or the
|
||
framework has already reset the state because of the exception (e.g. with a
|
||
scope guard). That is, by their nature, state machines typically only offer
|
||
basic exception safety.<br>
|
||
However, error handling with traditional FSM frameworks becomes surprisingly cumbersome as soon
|
||
as a lot of actions can fail and the state machine <b>itself</b> needs to gracefully handle
|
||
these errors. Usually, a failing action (e.g. x()) then posts an appropriate error event and sets a global error variable to
|
||
true. Every following action (e.g. y()) first has to check the error variable
|
||
before doing anything. After all actions have completed (by doing nothing!),
|
||
the previously posted error event has to be processed what leads
|
||
to the execution of the remedy action. Please note that it is not sufficient to
|
||
simply queue the error event as other events could still be pending. Instead,
|
||
the error event has absolute priority and has to be dealt with
|
||
immediately. There are slightly less cumbersome approaches to FSM error handling
|
||
but these usually necessitate a change of the state chart layout and thus
|
||
obscure the normal behavior. No matter what approach is used, programmers are
|
||
normally forced to write a lot of code that deals with errors and most of that
|
||
code is <b>not</b> devoted to error handling but to error propagation.</p>
|
||
<h3>Error handling support in boost::fsm</h3>
|
||
<p>C++ exceptions may be propagated from any action to signal a failure.
|
||
Depending on how the state machine is configured, such an exception is either
|
||
immediately propagated to the state machine client or caught and converted into
|
||
a special event that is dispatched immediately. For more information see the
|
||
<a href="tutorial.html#Exception handling">Exception handling</a>
|
||
chapter in the Tutorial.</p>
|
||
<h3>Two stage exit</h3>
|
||
<p>In boost::fsm, an exit action can be implemented by adding a
|
||
destructor to a state. Due to the nature of destructors, there are two
|
||
disadvantages to this approach:</p>
|
||
<ul>
|
||
<li>Since C++
|
||
destructors should virtually never throw, one cannot simply propagate an
|
||
exception from an exit action as one does when any of the other actions fails</li>
|
||
<li>When a <code>state_machine<></code> object is destructed then all currently active states are
|
||
inevitably also destructed. That is, state machine termination is tied to
|
||
the destruction of the state machine object</li>
|
||
</ul>
|
||
<p>In my experience, neither of the above points is usually problem in practice
|
||
since ...</p>
|
||
<ul>
|
||
<li>exit actions cannot often fail. If they can, such a failure is usually
|
||
either<ul>
|
||
<li>not
|
||
of interest to the outside world, i.e. the failure can simply be
|
||
ignored</li>
|
||
<li>so severe, that the application needs to be terminated anyway. In such a
|
||
situation stack unwind is almost never desirable and the failure is better
|
||
signaled through other mechanisms (e.g. abort())</li>
|
||
</ul>
|
||
</li>
|
||
<li>to clean up properly, often exit actions <b>must</b> be executed
|
||
when a state machine object is destructed, even if it is destructed as a
|
||
result of a stack unwind</li>
|
||
</ul>
|
||
<p>However, several people have put forward theoretical arguments and real-world
|
||
scenarios, which show that the exit action to destructor mapping <b>can</b> be a
|
||
problem and that workarounds are overly cumbersome. That's why
|
||
<a href="tutorial.html#Two_stage_exit">two stage exit</a> is now supported.</p>
|
||
<h2><a name="Asynchronous state machines">Asynchronous state machines</a></h2>
|
||
<h3>Requirements</h3>
|
||
<p>For asynchronous state machines different applications have rather varied
|
||
requirements:</p>
|
||
<ol>
|
||
<li>In some applications each state machine needs to run in its own thread,
|
||
other applications are single-threaded and run all machines in the same
|
||
thread</li>
|
||
<li>For some applications a FIFO scheduler is perfect, others need priority-
|
||
or EDF-schedulers</li>
|
||
<li>For some applications the boost::thread library is just fine, others
|
||
might want to use another threading library, yet other applications run on
|
||
OS-less platforms where ISRs are the only mode of (apparently) concurrent
|
||
execution</li>
|
||
</ol>
|
||
<h3>Out of the box behavior</h3>
|
||
<p>By default, <code>asynchronous_state_machine<></code> subclass objects are
|
||
serviced by a <code>fifo_scheduler<></code> object. <code>fifo_scheduler<></code>
|
||
does not lock or wait in single-threaded applications and uses boost::thread
|
||
primitives to do so in multi-threaded programs. Moreover, a <code>
|
||
fifo_scheduler<></code> object can service an arbitrary number of <code>
|
||
asynchronous_state_machine<></code> subclass objects. Under the hood, <code>
|
||
fifo_scheduler<></code> is just a thin wrapper around an object of its <code>
|
||
FifoWorker</code> template parameter (which manages the queue and ensures
|
||
thread safety) and a <code>processor_container<></code> (which manages the
|
||
lifetime of the state machines).</p>
|
||
<p>The UML standard mandates that an event not triggering a reaction in a
|
||
state machine should be silently discarded. Since a <code>fifo_scheduler<></code>
|
||
object is itself also a state machine, events destined to no longer existing
|
||
<code>asynchronous_state_machine<></code> subclass objects are also silently
|
||
discarded. This is enabled by the fact that <code>asynchronous_state_machine<></code>
|
||
subclass objects cannot be constructed or destructed directly. Instead, this
|
||
must be done through <code>fifo_scheduler<>::create_processor<>()</code> and
|
||
<code>fifo_scheduler<>::destroy_processor()</code> (<code>processor</code>
|
||
refers to the fact that <code>fifo_scheduler<></code> can only host <code>
|
||
event_processor<></code> subclass objects; <code>asynchronous_state_machine<></code>
|
||
is just one way to implement such a processor). Moreover, <code>
|
||
create_processor<>()</code> only returns a <code>processor_handle</code>
|
||
object. This must henceforth be used to initiate, queue events for, terminate
|
||
and destroy the state machine through the scheduler.</p>
|
||
<h3>Customization</h3>
|
||
<p>If a user needs to customize the scheduler behavior she can do so by
|
||
instantiating <code>fifo_scheduler<></code> with her own class modeling the
|
||
<code>FifoWorker</code> concept. I considered a much more generic design where
|
||
locking and waiting is implemented in a policy but I have so far failed to
|
||
come up with a clean and simple interface for it. Especially the waiting is a
|
||
bit difficult to model as some platforms have condition variables, others have
|
||
events and yet others don't have any notion of waiting whatsoever (they
|
||
instead loop until a new event arrives, presumably via an ISR). Given the
|
||
relatively few lines of code required to implement a custom <code>FifoWorker</code>
|
||
type and the fact that almost all applications will implement at most one such
|
||
class, it does not seem to be worthwhile anyway. Applications requiring a less
|
||
or more sophisticated event processor lifetime management can customize the
|
||
behavior at a more coarse level, by using a custom <code>Scheduler</code>
|
||
type. This is currently also true for applications requiring non-FIFO queuing
|
||
schemes. However, boost::fsm will probably provide a <code>priority_scheduler</code>
|
||
in the future so that custom schedulers need to be implemented only in rare
|
||
cases.</p>
|
||
<h2><a name="User actions: Member functions vs. function objects">User
|
||
actions: Member functions vs. function objects</a></h2>
|
||
<p>All user-supplied functions (<code>react</code> member functions, entry-,
|
||
exit- and transition-actions) must be class members. The reasons for this are
|
||
as follows: </p>
|
||
<ul>
|
||
<li>The concept of state-local storage mandates that state-entry and
|
||
state-exit actions are implemented
|
||
as members.</li>
|
||
<li><code>react</code> member functions and transition actions often access
|
||
state-local data. So, it is most natural to implement these functions as
|
||
members of the class the data of which the functions will operate on anyway.</li>
|
||
</ul>
|
||
<h2><a name="Speed versus scalability tradeoffs">Speed versus scalability
|
||
tradeoffs</a></h2>
|
||
<p>Quite a bit of effort has gone into making the library fast for small
|
||
simple machines <b>and</b> scaleable at the same time (this applies only to
|
||
<code>state_machine<></code>, there still is some room for optimizing <code>
|
||
fifo_scheduler<></code>, especially for multi-threaded builds). While I
|
||
believe it should perform reasonably in most applications, the scalability
|
||
does not come for free. Small, carefully handcrafted state machines will thus
|
||
easily outperform equivalent boost::fsm machines. To get a picture of how big
|
||
the gap is, I implemented a simple benchmark in the BitMachine example. The
|
||
Handcrafted example is a handcrafted variant of the 1-bit-BitMachine
|
||
implementing the same benchmark.</p>
|
||
<p>I tried to create a fair but somewhat unrealistic <b>worst-case</b>
|
||
scenario:</p>
|
||
<ul>
|
||
<li>For both machines exactly one object of the only event is allocated
|
||
before starting the test. This same object is then sent to the machines
|
||
over and over</li>
|
||
<li>The Handcrafted machine employs GOF-visitor double dispatch. The states
|
||
are preallocated so that event dispatch & transition amounts to nothing more
|
||
than two virtual calls and one pointer assignment</li>
|
||
</ul>
|
||
<p>The Benchmarks - compiled with MSVC7.1 (single threaded), running on
|
||
3.2GHz Intel Pentium 4 / 1.6GHz Pentium M - produced the following
|
||
dispatch and transition times per event:</p>
|
||
<ul>
|
||
<li>Handcrafted: 10 nanoseconds / 10 nanoseconds</li>
|
||
<li>1-bit-BitMachine with customized memory management: 130ns / 220ns</li>
|
||
</ul>
|
||
<p>Although this is a big difference I still think it will not be noticeable
|
||
in most real-world applications. No matter whether an application uses
|
||
handcrafted or boost::fsm machines it will...</p>
|
||
<ul>
|
||
<li>almost never run into a situation where a state machine is swamped with
|
||
as many events as in the benchmarks. Unless a state machine is abused for
|
||
parsing, it will almost always spend a good deal of time waiting for events
|
||
(which typically come from a human operator, from machinery or
|
||
from electronic devices over often comparatively slow I/O channels)</li>
|
||
<li>often run state machines in their own threads. This adds considerable
|
||
locking and thread-switching overhead. Performance tests with the PingPong
|
||
example, where two asynchronous state machines exchange events, gave the
|
||
following times to process one event and perform the resulting in-state
|
||
reaction (using the library with <code>boost::fast_pool_allocator<></code>):<ul>
|
||
<li>Single-threaded (no locking and waiting): 840ns / 840ns</li>
|
||
<li>Multi-threaded with one thread (the scheduler uses mutex locking but
|
||
never has to wait for events): 6500ns / 4800ns</li>
|
||
<li>Multi-threaded with two threads (both schedulers use mutex locking and
|
||
exactly one always waits for an event): 14000ns / 7000ns</li>
|
||
</ul>
|
||
<p>As mentioned above, there definitely is some room to improve the
|
||
timings for the asynchronous machines. Moreover, these are very crude
|
||
benchmarks, designed to show the overhead of locking and thread context
|
||
switching. The overhead in a real-world application will typically be
|
||
smaller and other operating systems can certainly do better in this area.
|
||
However, I strongly believe that on most platforms the threading overhead is
|
||
usually larger
|
||
than the time that boost::fsm spends for event dispatch and transition. Handcrafted machines will
|
||
inevitably have the same overhead, making raw single-threaded dispatch and
|
||
transition speed much less important</li>
|
||
<li>almost always allocate events with <code>new</code> and destroy them
|
||
after consumption. This will add a few cycles, even if event memory
|
||
management is customized</li>
|
||
<li>often use state machines that employ orthogonal states and other
|
||
advanced features. This forces the handcrafted machines to use a more
|
||
adequate and more time-consuming book-keeping</li>
|
||
</ul>
|
||
<p>Therefore, in real-world applications event dispatch and transition not
|
||
normally constitutes a bottleneck and the relative gap between handcrafted and
|
||
boost::fsm machines also becomes much smaller than in the worst-case scenario.</p>
|
||
<p>BitMachine measurements with more states and with different levels of
|
||
optimization:</p>
|
||
<table border="3" width="100%" id="AutoNumber2" cellpadding="2">
|
||
<tr>
|
||
<td width="25%" rowspan="2"><b>Machine configuration<br>
|
||
# states / # outgoing transitions per state</b></td>
|
||
<td width="75%" colspan="3"><b>Event dispatch & transition time [nanoseconds]<br>
|
||
<font color="#FF0000">MSVC 7.1: 3.2GHz Pentium 4 / 1.6GHz Pentium M</font><br>
|
||
<font color="#0000FF">GCC 3.4.2: 3.2GHz Pentium 4 / 1.6GHz Pentium M</font></b></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">Out of the box</td>
|
||
<td width="25%">Same as out of the box but with <code>
|
||
<a href="configuration.html#Application Defined Macros">
|
||
BOOST_FSM_USE_NATIVE_RTTI</a></code> defined</td>
|
||
<td width="25%">Same as out of the box but with customized memory
|
||
management</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">2 / 1</td>
|
||
<td width="25%"><font color="#FF0000">410 / 460</font><br>
|
||
<font color="#0000FF">540 / 480</font></td>
|
||
<td width="25%"><font color="#FF0000">490 / 570</font><br>
|
||
<font color="#0000FF">510 / 500</font></td>
|
||
<td width="25%"><font color="#FF0000">130 / 220</font><br>
|
||
<font color="#0000FF">320 / 230</font></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">4 / 2</td>
|
||
<td width="25%"><font color="#FF0000">440 / 470</font><br>
|
||
<font color="#0000FF">560 / 480</font></td>
|
||
<td width="25%"><font color="#FF0000">530 / 640</font><br>
|
||
<font color="#0000FF">570 / 550</font></td>
|
||
<td width="25%"><font color="#FF0000">160 / 240</font><br>
|
||
<font color="#0000FF">330 / 240</font></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">8 / 3</td>
|
||
<td width="25%"><font color="#FF0000">450 / 470</font><br>
|
||
<font color="#0000FF">580 / 510</font></td>
|
||
<td width="25%"><font color="#FF0000">580 / 700</font><br>
|
||
<font color="#0000FF">610 / 630</font></td>
|
||
<td width="25%"><font color="#FF0000">180 / 250</font><br>
|
||
<font color="#0000FF">340 / 260</font></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">16 / 4</td>
|
||
<td width="25%"><font color="#FF0000">490 / 480</font><br>
|
||
<font color="#0000FF">710 / 670</font></td>
|
||
<td width="25%"><font color="#FF0000">720 / 790</font><br>
|
||
<font color="#0000FF">770 / 750</font></td>
|
||
<td width="25%"><font color="#FF0000">230 / 260</font><br>
|
||
<font color="#0000FF">460 / 360</font></td>
|
||
</tr>
|
||
<tr>
|
||
<td width="25%">32 / 5</td>
|
||
<td width="25%"><font color="#FF0000">590 / 520</font><br>
|
||
<font color="#0000FF">790 / 690</font></td>
|
||
<td width="25%"><font color="#FF0000">820 / 880</font><br>
|
||
<font color="#0000FF">920 / 910</font></td>
|
||
<td width="25%"><font color="#FF0000">340 / 280</font><br>
|
||
<font color="#0000FF">590 / 470</font></td>
|
||
</tr>
|
||
</table>
|
||
<h2><a name="Memory management customization">Memory management customization</a></h2>
|
||
<p>Out of the box, all internal data is allocated on the normal heap. This
|
||
should be satisfactory for applications where both the following prerequisites
|
||
are met:</p>
|
||
<ul>
|
||
<li>There are no deterministic reaction time (hard real-time) requirements</li>
|
||
<li>The application will never run long enough for heap fragmentation to
|
||
become a problem. This is of course an issue for all long running programs
|
||
not only the ones employing this library. However, it should be noted that
|
||
fragmentation problems could show up earlier than with traditional FSM
|
||
frameworks</li>
|
||
</ul>
|
||
<p>Should an application not meet these prerequisites customization of
|
||
all memory management (not just boost::fsm's) should be considered, which is
|
||
supported as follows:</p>
|
||
<ul>
|
||
<li>By passing a class offering a <code>std::allocator<></code> interface
|
||
for the <code>Allocator</code> parameter of the <code>state_machine</code>
|
||
class template</li>
|
||
<li>By replacing the <code>simple_state</code>, <code>state</code> and <code>
|
||
event</code> class templates with ones that have a customized <code>operator
|
||
new()</code> and <code>operator delete()</code>. This can be as easy as
|
||
inheriting your customized class templates from the framework-supplied class
|
||
templates <b>and</b> your preferred small-object/deterministic/constant-time
|
||
allocator base class</li>
|
||
</ul>
|
||
<p><code>simple_state<></code> and <code>state<></code> subclass objects are
|
||
constructed and destructed only by the state machine. It would therefore be
|
||
possible to use the <code>state_machine<></code> allocator instead of forcing
|
||
the user to overload <code>operator new()</code> and <code>operator delete()</code>.
|
||
However, a lot of systems employ at most one instance of a particular state
|
||
machine, which means that a) there is at most one object of a particular state
|
||
and b) this object is always constructed, accessed and destructed by one and
|
||
the same thread. We can exploit these facts in a much simpler (and faster)
|
||
<code>new</code>/<code>delete</code> implementation (for example, see
|
||
UniqueObject.hpp in the BitMachine example). However, this is only possible as
|
||
long as we have the freedom to customize memory management for state classes
|
||
separately.</p>
|
||
<h2><a name="RTTI customization">RTTI customization</a></h2>
|
||
<p>RTTI is used for event dispatch and <code>state_downcast<>()</code>.
|
||
Currently, there are exactly two options:</p>
|
||
<ol>
|
||
<li>By default, a speed-optimized internal implementation is employed</li>
|
||
<li>The library can be instructed to use native C++ RTTI instead by defining
|
||
<code><a href="configuration.html#Application Defined Macros">
|
||
BOOST_FSM_USE_NATIVE_RTTI</a></code></li>
|
||
</ol>
|
||
<p>Just about the only reason to favor 2 is the fact that state and event
|
||
objects need to store one pointer less, meaning that in the best case the
|
||
memory footprint of a state machine object could shrink by 15% (an empty event
|
||
is typically 30% smaller, what can be an advantage when there are bursts of
|
||
events rather than a steady flow). However, on most platforms executable size
|
||
grows when C++ RTTI is turned on. So, given the small per machine object
|
||
savings, option 2 only makes sense in applications where both of the following
|
||
conditions hold:</p>
|
||
<ul>
|
||
<li>Event dispatch will never become a
|
||
bottleneck</li>
|
||
<li>There is a need to reduce the memory allocated at runtime (at the cost
|
||
of a larger executable)</li>
|
||
</ul>
|
||
<p>Obvious candidates are embedded systems where the executable resides in
|
||
ROM. Other candidates are applications running a large number of identical
|
||
state machines where this measure could even reduce the <b>overall</b> memory
|
||
footprint.</p>
|
||
<h2><a name="Double dispatch">Double dispatch</a></h2>
|
||
<p>At the heart of every state machine lies an implementation of double
|
||
dispatch. This is due to the fact that the incoming event <b>and</b> the
|
||
active state define exactly which <a href="definitions.html#Reaction">reaction</a>
|
||
the state machine will produce. For each event dispatch, one virtual call is
|
||
followed by a linear search for the appropriate reaction, using one RTTI
|
||
comparison per reaction. The following alternatives were considered but
|
||
rejected:</p>
|
||
<ul>
|
||
<li><a href="http://www.objectmentor.com/resources/articles/acv.pdf">Acyclic
|
||
visitor</a>: This double-dispatch variant satisfies all scalability
|
||
requirements but performs badly due to costly inheritance tree cross-casts.
|
||
Moreover, a state must store one v-pointer for <b>each</b> reaction what
|
||
slows down construction and makes memory management customization
|
||
inefficient. In addition, C++ RTTI must inevitably be turned on, with
|
||
negative effects on executable size. boost::fsm originally employed acyclic
|
||
visitor and was about 4 times slower than it is now (MSVC7.1 on Intel
|
||
Pentium M). The dispatch speed might be better on other platforms but the
|
||
other negative effects will remain.</li>
|
||
<li>
|
||
<a href="http://www.isbiel.ch/~due/courses/c355/slides/patterns/visitor.pdf">
|
||
GOF Visitor</a>: The GOF Visitor pattern inevitably makes the whole machine
|
||
depend upon all events. That is, whenever a new event is added there is no
|
||
way around recompiling the whole state machine. This is contrary to the
|
||
scalability requirements.</li>
|
||
<li>Two-dimensional array of function pointers: To satisfy requirement 6, it
|
||
should be possible to spread a single state machine over several translation
|
||
units. This however means that the dispatch table must be filled at runtime
|
||
and the different translation units must somehow make themselves "known", so
|
||
that their part of the state machine can be added to the table. There simply
|
||
is no way to do this automatically <b>and</b> portably. The only portable
|
||
way that a state machine distributed over several translation units could
|
||
employ table-based double dispatch relies on the user. The programmer(s)
|
||
would somehow have to <b>manually</b> tie together the various pieces of the
|
||
state machine. Not only does this scale badly but is also quite error-prone.</li>
|
||
</ul>
|
||
<h2><a name="Resource usage">Resource usage</a></h2>
|
||
<h3>Memory</h3>
|
||
<p>On a 32-bit box, one empty active state typically needs less than 50 bytes
|
||
of memory. Even <b>very</b> complex machines will usually have less than 20
|
||
simultaneously active states so just about every machine should run with less
|
||
than one kilobyte of memory (not counting event queues). Obviously, the
|
||
per-machine memory footprint is offset by whatever state-local members the
|
||
user adds.</p>
|
||
<h3>Processor cycles</h3>
|
||
<p>The following ranking should give a rough picture of what feature will
|
||
consume how many cycles:</p>
|
||
<ol>
|
||
<li><code>state_cast<>()</code>: By far the most cycle-consuming feature.
|
||
Searches linearly for a suitable state, using one <code>dynamic_cast</code>
|
||
per visited state</li>
|
||
<li>State entry and exit: Profiling of the fully optimized 1-bit-BitMachine
|
||
suggested that roughly half of the total dispatch time is spent destructing the
|
||
exited state and constructing the entered state. Obviously, transitions
|
||
where the <a href="definitions.html#Innermost common context">innermost
|
||
common context</a> is "far" from the leaf states and/or with lots of
|
||
orthogonal states can easily cause the destruction and construction of quite
|
||
a few states leading to significant amounts of time spent for a transition</li>
|
||
<li><code>state_downcast<>()</code>: Searches linearly for the requested
|
||
state, using one virtual call and one RTTI comparison per visited state</li>
|
||
<li>Deep history: For all innermost states inside a state passing either
|
||
<code>has_deep_history</code> or <code>has_full_history</code> to its
|
||
state base class, a binary search
|
||
through the (usually small) history map must be performed on each exit.
|
||
History slot allocation is performed exactly once, at first exit</li>
|
||
<li>Shallow history: For all direct inner states of a state passing either
|
||
<code>has_shallow_history</code> or <code>has_full_history</code> to its
|
||
state base class, a binary search
|
||
through the (usually small) history map must be performed on each exit.
|
||
History slot allocation is performed exactly once, at first exit</li>
|
||
<li>Event dispatch: One virtual call followed by a linear search for a
|
||
suitable <a href="definitions.html#Reaction">reaction</a>, using one RTTI
|
||
comparison per visited reaction</li>
|
||
<li>Orthogonal states: One additional virtual call for each exited state <b>
|
||
if</b> there is more than one active leaf state before a transition. It should
|
||
also be noted that the worst-case event dispatch time is multiplied in the
|
||
presence of orthogonal states. For example, if two orthogonal leaf states
|
||
are added to a given state configuration, the worst-case time is tripled</li>
|
||
</ol>
|
||
<h2><a name="Limitations">Limitations</a></h2>
|
||
<h4>"Lost" events</h4>
|
||
<p>It is currently not possible to specially handle events that have not
|
||
triggered a reaction (and would thus be silently discarded). However, a
|
||
facility allowing this will probably be added in the
|
||
not-too-distant future.</p>
|
||
<h4>Deferring and posting events</h4>
|
||
<p>For performance reasons and because synchronous state machines often do not
|
||
need to queue events, it is possible to operate such machines entirely with
|
||
stack-allocated events. However, as soon as events need to be deferred and/or
|
||
posted there is no way around queuing and allocation with <code>new</code>.
|
||
The interface of <code>simple_state<>::post_event</code> enforces the use of
|
||
<code>boost::intrusive_ptr<></code> at compile time. But there is no way to do
|
||
the same for deferred events because allocation and deferral happen in
|
||
completely unrelated places. Of course, a "wrongly" allocated event could
|
||
easily be transformed into one allocated with <code>new</code> and pointed to
|
||
by <code>boost::intrusive_ptr<></code> with a virtual <code>clone()</code>
|
||
function. However, in my experience, event deferral is needed only very rarely
|
||
in synchronous state machines and the asynchronous variant enforces the use of
|
||
<code>boost::intrusive_ptr<></code> anyway. So, most users won't run into this
|
||
limitation and I rejected the <code>clone()</code> idea because it could cause
|
||
inefficiencies casual users wouldn't be aware of. In addition, users not
|
||
needing event deferral would nevertheless pay with increased code size.</p>
|
||
<h4>Junction points</h4>
|
||
<p>UML junction points are not supported because arbitrarily complex guard
|
||
expressions can easily be implemented with <code>custom_reaction<></code>s.</p>
|
||
<h4>Dynamic choice points</h4>
|
||
<p>Currently there is no direct support for this UML element because its
|
||
behavior can often be implemented with <code>custom_reaction<></code>s. In
|
||
rare cases this is not possible, namely when a choice point happens to be the
|
||
initial state. Then, the behavior can easily be implemented as follows:</p>
|
||
<pre>struct make_choice : fsm::event< make_choice > {};
|
||
|
||
// universal choice point base class template
|
||
template< class MostDerived, class Context >
|
||
struct choice_point : fsm::state< MostDerived, Context,
|
||
fsm::custom_reaction< make_choice > >
|
||
{
|
||
typedef fsm::state< MostDerived, Context,
|
||
fsm::custom_reaction< make_choice > > base_type;
|
||
typedef typename base_type::my_context my_context;
|
||
typedef choice_point my_base;
|
||
|
||
choice_point( my_context ctx ) : base_type( ctx )
|
||
{
|
||
this->post_event( boost::intrusive_ptr< make_choice >(
|
||
new make_choice() ) );
|
||
}
|
||
};
|
||
|
||
// ...
|
||
|
||
struct MyChoicePoint;
|
||
struct Machine : fsm::state_machine< Machine, MyChoicePoint > {};
|
||
|
||
struct Dest1 : fsm::simple_state< Dest1, Machine > {};
|
||
struct Dest2 : fsm::simple_state< Dest2, Machine > {};
|
||
struct Dest3 : fsm::simple_state< Dest3, Machine > {};
|
||
|
||
struct MyChoicePoint : choice_point< MyChoicePoint, Machine >
|
||
{
|
||
MyChoicePoint( my_context ctx ) : my_base( ctx ) {}
|
||
|
||
fsm::result react( const make_choice & )
|
||
{
|
||
if ( /* ... */ )
|
||
{
|
||
return transit< Dest1 >();
|
||
}
|
||
else if ( /* ... */ )
|
||
{
|
||
return transit< Dest2 >();
|
||
}
|
||
else
|
||
{
|
||
return transit< Dest3 >();
|
||
}
|
||
}
|
||
};</pre>
|
||
<p><code>choice_point<></code> is not currently part of boost::fsm, mainly
|
||
because I fear that beginners could use it in places where they would be
|
||
better off with <code>custom_reaction<></code>. If the demand is high enough I
|
||
will add it to the library.</p>
|
||
<h4>Deep history of orthogonal regions</h4>
|
||
<p>Deep history of states with orthogonal regions is currently not supported:</p>
|
||
<p><img border="0" src="DeepHistoryLimitation1.gif" width="331" height="346"></p>
|
||
<p>Attempts to implement this state chart will lead to a compile-time error
|
||
because B has orthogonal regions and its direct or indirect outer state
|
||
contains a deep history pseudo state. In other words, a state containing a
|
||
deep history pseudo state must not have any direct or indirect inner states
|
||
which themselves have orthogonal regions. This limitation stems from the fact
|
||
that full deep history support would be more complicated to implement and
|
||
would consume more resources than the currently implemented limited deep
|
||
history support. Moreover, full deep history behavior can easily be
|
||
implemented with shallow history:</p>
|
||
<p><img border="0" src="DeepHistoryLimitation2.gif" width="332" height="347"></p>
|
||
<p>Of course, this only works if C, D, E or any of their direct or indirect
|
||
inner states do not have orthogonal regions. If not so then this pattern has
|
||
to be applied recursively.</p>
|
||
<h4>Synchronization (join and fork) bars</h4>
|
||
<p><img border="0" src="JoinAndFork.gif" width="541" height="301"></p>
|
||
<p>Synchronization bars are not supported, that is, a transition always
|
||
originates at exactly one state and always ends at exactly one state. In my
|
||
experience join bars are sometimes useful but their behavior can easily be
|
||
emulated with guards. Fork bars are needed only rarely. Their support would
|
||
complicate the implementation quite a bit.</p>
|
||
<h4>Event dispatch to orthogonal regions</h4>
|
||
<p>The boost::fsm event dispatch algorithm is different to the one specified
|
||
in
|
||
<a href="http://www.wisdom.weizmann.ac.il/~dharel/SCANNED.PAPERS/Statecharts.pdf">
|
||
David Harel's original paper</a> and in the
|
||
<a href="http://www.omg.org/cgi-bin/doc?formal/03-03-01">UML standard</a>.
|
||
Both mandate that each event is dispatched to all orthogonal regions of a
|
||
state machine. Example:</p>
|
||
<p><img border="0" src="EventDispatch.gif" width="436" height="211"></p>
|
||
<p>Here the Harel/UML dispatch algorithm specifies that the machine must
|
||
transition from (B,D) to (C,E) when an EvX event is processed. Because of the
|
||
subtleties that Harel describes in chapter 7 of
|
||
<a href="http://www.wisdom.weizmann.ac.il/~dharel/SCANNED.PAPERS/Statecharts.pdf">
|
||
his paper</a>, an implementation of this algorithm is not only quite complex
|
||
but also much slower than the simplified version employed by boost::fsm, which
|
||
stops searching for <a href="definitions.html#Reaction">reactions</a> as soon
|
||
as it has found one suitable for the current event. That is, had the example
|
||
been implemented with this library, the machine would have transitioned
|
||
non-deterministically from (B,D) to either (C,D) or (B,E). This version was
|
||
chosen because, in my experience, in real-world machines different orthogonal
|
||
regions often do not specify transitions for the same events. For the rare
|
||
cases when they do, the UML behavior can easily be emulated as follows:</p>
|
||
<p><img border="0" src="SimpleEventDispatch.gif" width="466" height="226"></p>
|
||
<h4>Transitions across orthogonal regions</h4>
|
||
<p>
|
||
<img border="0" src="TransitionsAcrossOrthogonalRegions.gif" width="226" height="271"></p>
|
||
<p>Such transitions are currently flagged with an error at compile time (the
|
||
UML specifications explicitly allow them while Harel does not mention them at
|
||
all). I decided to not support them because I have erroneously tried to
|
||
implement such a transition several times but have never come across a
|
||
situation where it would make any sense. If you need to make such transitions,
|
||
please do let me know!</p>
|
||
<hr>
|
||
<p>Revised
|
||
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->03 February, 2005<!--webbot bot="Timestamp" endspan i-checksum="40404" --></p>
|
||
<p><i><EFBFBD> Copyright <a href="mailto:ahd6974-spamgroupstrap@yahoo.com">Andreas Huber D<>nni</a>
|
||
2003-2005. <b><font color="#FF0000">Please remove the words spam and trap from
|
||
the email address behind the link</font></b></i></p>
|
||
<p><i>Distributed under the Boost Software License, Version 1.0. (See
|
||
accompanying file <a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
|
||
copy at <a href="http://www.boost.org/LICENSE_1_0.txt">
|
||
http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
|
||
|
||
</body>
|
||
|
||
</html>
|