ContextOliverKowalke2009Oliver Kowalke
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
C++ Library for swiching different user ctx
OverviewBoost.Context is a foundational library that
provides a sort of cooperative multitasking on a single thread. By providing
an abstraction of the current execution state in the current thread, including
the stack (with local variables) and stack pointer, all registers and CPU flags,
and the instruction pointer, a fcontext_t instance represents
a specific point in the application's execution path. This is useful for building
higher-level abstractions, like coroutines, cooperative
threads (userland threads) or an aquivalent to C#
keyword yield in C++.
A fcontext_t provides the means to suspend the current
execution path and to transfer execution control, thereby permitting another
fcontext_t to run on the current thread. This stateful
transfer mechanism enables a fcontext_t to suspend execution
from within nested functions and, later, to resume from where it was suspended.
While the execution path represented by a fcontext_t only
runs on a single thread, it can be migrated to another thread at any given
time.
A context switch between threads requires system calls (involving the OS kernel),
which can cost more than thousand CPU cycles on x86 CPUs. By contrast, transferring
control among them requires only fewer than hundred CPU cycles because it does
not involve system calls as it is done within a single thread.
In order to use the classes and functions described here, you can either include
the specific headers specified by the descriptions of each class or function,
or include the master library header:
#include<boost/context/all.hpp>
which includes all the other headers in turn.
All functions and classes are contained in the namespace boost::ctx.
RequirementsBoost.Context must be built for the particular
compiler(s) and CPU architecture(s)s being targeted. Boost.Context
includes assembly code and, therefore, requires GNU AS for supported POSIX
systems, and MASM for Windows systems.
Please note that address-model=64 must be given to bjam command line on 64bit
Windows (boost-build issue).
Context
Each instance of fcontext_t represents a context (CPU
registers and stack space). Together with its related functions jump_fcontext()
and make_fcontext() it provides a execution control transfer
mechanism similar interface like ucontext_t.
fcontext_t and its functions are located in boost::ctx
and the functions are declared as extern "C".
If fcontext_t is used in a multithreaded application,
it can migrated between threads, but must not reference thread-local
storage.
If fiber-local storage is used on Windows, the user
is responsible for calling ::FlsAlloc(), ::FlsFree().
The low level API is the part to port to new platforms.
Executing a context
A new context supposed to execute a context-function (returning
void and accepting intptr_t as argument) must be initialized by function make_fcontext().
// context-function
voidf(intptr);// creates and manages a protected stack (with guard page)
boost::ctx::protected_stackstack(boost::ctx::default_stacksize());// let fcontext_t fc use stack
fc.fc_stack.base=stack.address();fc.fc_stack.limit=static_cast<char*>(fc.fc_stack.base)-stack.size();// context fc uses f() as context function
make_fcontext(&fc,f);fcontext_t requires a pointer to the top of the stack
(fc_base) as well as a pointer to the lower bound of the
stack (fc_limit).
Calling jump_fcontext() invokes the context-function
in a newly created context complete with registers, flags, stack and instruction
pointers. When control should be returned to the original calling context,
call jump_fcontext(). The current context information
(registers, flags, and stack and instruction pointers) is saved and the original
context information is restored. Calling jump_fcontext()
again resumes execution in the second context after saving the new state of
the original context.
namespacectx=boost::ctx;ctx::fcontext_tfcm,fc1,fc2;voidf1(intptr_t){std::cout<<"f1: entered"<<std::endl;std::cout<<"f1: call jump_fcontext( & fc1, & fc2, 0)"<<std::endl;ctx::jump_fcontext(&fc1,&fc2,0);std::cout<<"f1: return"<<std::endl;ctx::jump_fcontext(&fc1,&fcm,0);}voidf2(intptr_t){std::cout<<"f2: entered"<<std::endl;std::cout<<"f2: call jump_fcontext( & fc2, & fc1, 0)"<<std::endl;ctx::jump_fcontext(&fc2,&fc1,0);BOOST_ASSERT(false&&!"f2: never returns");}intmain(intargc,char*argv[]){ctx::stack_allocatoralloc1,alloc2;fc1.fc_stack.base=alloc1.allocate(ctx::minimum_stacksize());fc1.fc_stack.limit=static_cast<char*>(fc1.fc_stack.base)-ctx::minimum_stacksize();ctx::make_fcontext(&fc1,f1);fc2.fc_stack.base=alloc2.allocate(ctx::minimum_stacksize());fc2.fc_stack.limit=static_cast<char*>(fc2.fc_stack.base)-ctx::minimum_stacksize();ctx::make_fcontext(&fc2,f2);std::cout<<"main: call jump_fcontext( & fcm, & fc1, 0)"<<std::endl;ctx::jump_fcontext(&fcm,&fc1,0);std::cout<<"main: done"<<std::endl;returnEXIT_SUCCESS;}output:main:calljump_fcontext(&fcm,&fc1,0)f1:enteredf1:calljump_fcontext(&fc1,&fc2,0)f2:enteredf2:calljump_fcontext(&fc2,&fc1,0)f1:returnmain:done
First call of jump_fcontext() enters the context-functionf1()
by starting context fc1 (context fcm saves the registers of main()). For jumping between context's fc1 and fc2
jump_fcontext()
is called. Because context fcm is chained to fc1, main() is entered (returning from jump_fcontext())
after context fc1 becomes complete (return from f1()).
Calling jump_fcontext() to the same context from inside
the same context results in undefined behaviour.
In contrast to threads, which are preemtive, fcontext_t
switches are cooperative (programmer controls when switch will happen). The
kernel is not involved in the context switches.
Transfer of data
The third argument passed to jump_fcontext(), in one context,
is passed as the first argument of the context-function
if the context is started for the first time. In all following invocations
of jump_fcontext() the intptr_t passed to jump_fcontext(),
in one context, is returned by jump_fcontext() in the
other context.
namespacectx=boost::ctx;ctx::fcontext_tfc1,fcm;typedefstd::pair<int,int>pair_t;voidf1(intptr_tparam){pair_t*p=(pair_t*)param;p=(pair_t*)ctx::jump_fcontext(&fc1,&fcm,(intptr_t)(p->first+p->second));ctx::jump_fcontext(&fc1,&fcm,(intptr_t)(p->first+p->second));}intmain(intargc,char*argv[]){ctx::stack_allocatoralloc;fc1.fc_stack.base=alloc.allocate(ctx::minimum_stacksize());fc1.fc_stack.limit=static_cast<char*>(fc1.fc_stack.base)-ctx::minimum_stacksize();fc1.fc_link=&fcm;pair_tp(std::make_pair(2,7));ctx::make_fcontext(&fc1,f1);intres=(int)ctx::jump_fcontext(&fcm,&fc1,(intptr_t)&p);std::cout<<p.first<<" + "<<p.second<<" == "<<res<<std::endl;p=std::make_pair(5,6);res=(int)ctx::jump_fcontext(&fcm,&fc1,(intptr_t)&p);std::cout<<p.first<<" + "<<p.second<<" == "<<res<<std::endl;std::cout<<"main: done"<<std::endl;returnEXIT_SUCCESS;}output:2+7==95+6==11main:done
Exceptions
in context-function
If the context-function emits an exception, the application
will terminate.
Preserving
floating point registers
Preserving the floating point registers increases the cycle count for a context
switch (see performance tests). The foruth argument of jump_fcontext()
controls if fpu registers should be preserved by the context jump.
The use of the fpu controling argument of jump_fcontext()
must be consistent in the application. Otherwise the behaviour is undefined.
Stack unwinding
Sometimes it is necessary to unwind the stack of an unfinished context to destroy
local stack variables so they can release allocated resources (RAII pattern).
The user is responsible for this task.
Struct fcontext_t and related functionsstructstack_t{void*base;void*limit;};structfcontext_t{<platformspecific>stack_tfc_stack;};intptr_tjump_fcontext(fcontext_t*ofc,fcontext_tconst*nfc,intptr_tvp);voidmake_fcontext(fcontext_t*fc,void(*fn)(intptr_t));baseMember:
Pointer to the top of the stack.
limitMember:
Pointer to the bottom of the stack.
fc_stackMember:
Tracks the memory for the context's stack.
intptr_tjump_fcontext(fcontext_t*ofc,fcontext_t*nfc,intptr_tp,boolpreserve_fpu)Effects:
Stores the current context data (stack pointer, instruction pointer,
and CPU registers) to *ofc and restores the context data
from *nfc,
which implies jumping to *nfc's execution context. The intptr_t
argument, p, is passed
to the current context to be returned by the most recent call to jump_fcontext()
in the same thread. The last argument controls if fpu registers have
to be preserved.
Returns:
The third pointer argument passed to the most recent call to jump_fcontext(),
if any.
voidmake_fcontext(fcontext_t*fc,void(*fn)(intptr_t))Precondition:
A stack is applied to *fc before make_fcontext() is called.
Effects:
Modifies *fc
in order to execute fn
when the context is activated next.
Stack allocation
A fcontext_t requires a stack which will be allocated/deallocated
by a StackAllocator. Boost.Context
uses stack_allocator by default
but a customized stackallocator
can be passed to the context constructor instead. If a context is constructed
it invokes allocate() function and by its destruction
the stack gets released by deallocate().
StackAllocator
concept
A StackAllocator must satisfy the StackAllocator
concept requirements shown in the following table, in which a is an object of a StackAllocator
type, p is a void*, and
s is a std::size_t:
expression
return type
notes
a.allocate(s)void*
returns a pointer to s
bytes allocated from the stack
a.deallocate(p,s)void
deallocates s bytes
of memory beginning at p,
a pointer previously returned by a.allocate()
The implementation of allocate() might include logic to protect against
exceeding the context's available stack size rather than leaving it as undefined
behaviour.
Calling deallocate()
with a pointer not returned by allocate() results in undefined behaviour.
The stack is not required to be aligned; alignment takes place inside make_fcontext().
Class stack_allocatorBoost.Context provides a StackAllocatorstack_allocator which models
the StackAllocator concept concept. It appends a guard-page
to protect against exceeding the stack. If the guard page is accessed (read
or write operation) a segmentation fault/access violation is generated by
the operating system.
Helper functionsBoost.Context provides easy access to the
stack related limits defined by the environment.
std::size_tdefault_stacksize();std::size_tminimum_stacksize();std::size_tmaximum_stacksize();boolis_stack_unbound();std::size_tpagesize();std::size_tpage_count(std::size_tstacksize);std::size_tdefault_stacksize()Returns:
Returns a default stack size, which may be platform specific. The present
implementation returns a value of 256 kB.
std::size_tminimum_stacksize()Returns:
Returns the minimum size in bytes of stack defined by the environment.
Throws:
Nothing.
std::size_tmaximum_stacksize()Preconditions:is_stack_unbound()
returns false.
Returns:
Returns the maximum size in bytes of stack defined by the environment.
Throws:
Nothing.
boolis_stack_unbound()Returns:
Returns true if the environment
defines no limit for the size of a stack.
Throws:
Nothing.
std::size_tpagesize()Returns:
Returns how many bytes the operating system allocates for one page.
Throws:
Nothing.
std::size_tpage_count(std::size_tstacksize)Returns:
Returns how many pages have to be allocated for a stack of stacksize bytes.
Throws:
Nothing.
Performance
Performance of Boost.Context was measured
on the platforms shown in the following table. Performance measurements were
taken using rdtsc, with overhead
corrections, on x86 platforms. In each case, stack protection was active, cache
warm-up was accounted for, and the one running thread was pinned to a single
CPU. The code was compiled using the build options, 'variant = release cxxflags
= -DBOOST_DISABLE_ASSERTS'.
The numbers in the table are the number of cycles per iteration, based upon
an average computed over 10 iterations.
Tested Platforms
Platform
OS
Compiler
ABI
ARM (ARM926EJ-S)
Debian GNU/Linux (Lenny)
GCC 4.4.4
ARM APCS (Linux)
MIPS (MIPS 24K)
Debian GNU/Linux (Lenny)
GCC 4.3.2
O32
MIPS (O2 / MIPS R5000)
Debian GNU/Linux (Lenny)
GCC 4.3.2
O32
PowerPC (7400)
Debian GNU/Linux (Lenny)
GCC 4.3.2
SYSV
X86_64 (Intel Core2 Quad)
Ubuntu GNU/Linux (Lucid Lynx)
GCC 4.4.3
SYSV
X86_64
Windows 7
MS VC 10.0
PE
I386
Debian GNU/Linux (Lenny)
GCC 4.4.3
SYSV
I386
FreeBSD 8.0
GCC 4.2.1
SYSV
I386
OpenSolaris 2009.06
GCC 4.3.2
SYSV
I386
Windows XP
MSVC 9.0
PE
Rationale
No inline-assembler
Some newer compiler (for instance MSVC 10 for x86_64 and itanium) do not support
inline assembler. MSDN article
'Inline Assembler'.
fcontext_t
Boost.Context provides the low level API fcontext_t
which is implemented in assembler to provide context swapping operations. fcontext_t
is the part to port to new platforms.
Context switches do not preserve the signal mask on UNIX systems.
Because the assembler code uses the byte layout of fcontext_t
to access its members fcontext_t must be a POD. This requires
that fcontext_t has only a default constructor, no visibility
keywords (e.g. private, public, protected), no virtual methods and all members
and base clases are PODs too.
Protecting the stack
Because the stack's size is fixed -- there is no support for split stacks yet
-- it is important to protect against exceeding the stack's bounds. Otherwise,
in the best case, overrunning the stack's memory will result in a segmentation
fault or access violation and, in the worst case, the application's memory
will be overwritten. stack_allocator
appends a guard page to the stack to help detect overruns. The guard page consumes
no physical memory, but generates a segmentation fault or access violation
on access to the virtual memory addresses within it.
Other APIs
setjmp()/longjmp()
C99 defines setjmp()/longjmp()
to provide non-local jumps but it does not require that longjmp()
preserves the current stack frame. Therefore, jumping into a function which
was exited via a call to longjmp() is undefined ISO/IEC 9899:1999,
2005, 7.13.2.1:2
.
ucontext_t
Since POSIX.1-2003 ucontext_t
is deprecated and was removed in POSIX.1-2008! The function signature of
makecontext()
is:
voidmakecontext(ucontext_t*ucp,void(*func)(),intargc,...);
The third argument of makecontext() specifies the number of integer arguments
that follow which will require function pointer cast if func
will accept those arguments which is undefined in C99 ISO/IEC 9899:1999,
2005, J.2
.
The arguments in the var-arg list are required to be integers, passing pointers
in var-arg list is not guarantied to work, especially it will fail for architectures
where pointers are larger than integers.
ucontext_t preserves signal
mask between context switches which involes system calls consuming a lot
of CPU cycles (ucontext_t is slower by perfomance_link[factor 13x] relative
to fcontext_t).
Windows fibers
A drawback of Windows Fiber API is that CreateFiber() does not accept a pointer to user allocated
stack space preventing the reuse of stacks for other context instances. Because
the Windows Fiber API requires to call ConvertThreadToFiber() if SwitchFiber() is called for a thread which has not been
converted to a fiber. For the same reason ConvertFiberToThread() must be called after return from SwitchFiber()
if the thread was forced to be converted to a fiber before (which is inefficient).
if(!is_a_fiber()){ConvertThreadToFiber(0);SwitchToFiber(ctx);ConvertFiberToThread();}
If the condition _WIN32_WINNT>=_WIN32_WINNT_VISTA
is met function IsThreadAFiber() is provided in order to detect if the current
thread was already converted. Unfortunately Windows XP + SP 2/3 defines
_WIN32_WINNT>=_WIN32_WINNT_VISTA without providing
IsThreadAFiber().
x86 and
floating-point env
i386
"The FpCsr and the MxCsr register must be saved and restored before
any call or return by any procedure that needs to modify them ..."
'Calling
Conventions', Agner Fog.
x86_64
Windows
MxCsr - "A callee that modifies any of the nonvolatile fields within
MxCsr must restore them before returning to its caller. Furthermore, a caller
that has modified any of these fields must restore them to their standard
values before invoking a callee ..." MSDN
article 'MxCsr'.
FpCsr - "A callee that modifies any of the fields within FpCsr must
restore them before returning to its caller. Furthermore, a caller that has
modified any of these fields must restore them to their standard values before
invoking a callee ..." MSDN
article 'FpCsr'.
"The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved
across context switches. There is no explicit calling convention for these
registers." MSDN
article 'Legacy Floating-Point Support'.
"The 64-bit Microsoft compiler does not use ST(0)-ST(7)/MM0-MM7".
'Calling
Conventions', Agner Fog.
"XMM6-XMM15 must be preserved" MSDN
article 'Register Usage'
SysV
"The control bits of the MxCsr register are callee-saved (preserved
across calls), while the status bits are caller-saved (not preserved). The
x87 status word register is caller-saved, whereas the x87 control word (FpCsr)
is callee-saved." SysV ABI AMD64
Architecture Processor Supplement Draft Version 0.99.4, 3.2.1.
Reference
ARM
AAPCS ABI: Procedure
Call Standard for the ARM Architecture
AAPCS/LINUX: ARM
GNU/Linux Application Binary Interface Supplement
MIPS
O32 ABI: SYSTEM V
APPLICATION BINARY INTERFACE, MIPS RISC Processor Supplement
PowerPC32
SYSV ABI: SYSTEM
V APPLICATION BINARY INTERFACE PowerPC Processor Supplement
PowerPC64
SYSV ABI: PowerPC
User Instruction Set Architecture, Book I
X86-32
SYSV ABI: SYSTEM
V APPLICATION BINARY INTERFACE, Intel386TM Architecture Processor Supplement
MS PE: Calling
Conventions
X86-64
SYSV ABI: System
V Application Binary Interface, AMD64 Architecture Processor Supplement
MS PE: x64
Software ConventionsTodo
provide support for SPARC
support split-stack feature from gcc/gold linker
Acknowledgments
I'd like to thank Adreas Fett, Artyom Beilis, Fernando Pelliccioni, Giovanni
Piero Deretta, Gordon Woodhull, Helge Bahmann, Holger Grund, Jeffrey Lee Hellrung
(Jr.), Keith Jeffery, Phil Endecott, Robert Stewart, Steven Watanabe, Vicente
J. Botet Escriba.