Oliver Kowalke 2014 Oliver Kowalke Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) C++ Library for swiching different user ctx Context
<link linkend="context.overview">Overview</link> Boost.Context is a foundational library that provides a sort of cooperative multitasking on a single thread. By providing an abstraction of the current execution state in the current thread, including the stack (with local variables) and stack pointer, all registers and CPU flags, and the instruction pointer, a fcontext_t instance represents a specific point in the application's execution path. This is useful for building higher-level abstractions, like coroutines, cooperative threads (userland threads) or an equivalent to C# keyword yield in C++. A fcontext_t provides the means to suspend the current execution path and to transfer execution control, thereby permitting another fcontext_t to run on the current thread. This state full transfer mechanism enables a fcontext_t to suspend execution from within nested functions and, later, to resume from where it was suspended. While the execution path represented by a fcontext_t only runs on a single thread, it can be migrated to another thread at any given time. A context switch between threads requires system calls (involving the OS kernel), which can cost more than thousand CPU cycles on x86 CPUs. By contrast, transferring control among them requires only few CPU cycles because it does not involve system calls as it is done within a single thread. In order to use the classes and functions described here, you can either include the specific headers specified by the descriptions of each class or function, or include the master library header: #include <boost/context/all.hpp> which includes all the other headers in turn. All functions and classes are contained in the namespace boost::context.
<link linkend="context.requirements">Requirements</link> Boost.Context must be built for the particular compiler(s) and CPU architecture(s)s being targeted. Boost.Context includes assembly code and, therefore, requires GNU as and GNU preprocesspr for supported POSIX systems, MASM for Windows/x86 systems and ARMasm for Windows/arm systems. MASM64 (ml64.exe) is a part of Microsoft's Windows Driver Kit. Please note that address-model=64 must be given to bjam command line on 64bit Windows for 64bit build; otherwise 32bit code will be generated. For cross-compiling the lib you must specify certain additional properties at bjam command line: target-os, abi, binary-format, architecture and address-model.
<link linkend="context.context">Struct fcontext_t</link> Each instance of fcontext_t represents a context (CPU registers and stack space). Together with its related functions jump_fcontext() and make_fcontext() it provides a execution control transfer mechanism similar interface like ucontext_t. fcontext_t and its functions are located in boost::context and the functions are declared as extern "C". If fcontext_t is used in a multi threaded application, it can migrated between threads, but must not reference thread-local storage. The low level API is the part to port to new platforms. If fiber-local storage is used on Windows, the user is responsible for calling ::FlsAlloc(), ::FlsFree(). Executing a context A new context supposed to execute a context-function (returning void and accepting intptr_t as argument) will be created on top of the stack (at 16 byte boundary) by function make_fcontext(). // context-function void f(intptr); // creates a new stack std::size_t size = 8192; void* sp(std::malloc(size)); // context fc uses f() as context function // fcontext_t is placed on top of context stack // a pointer to fcontext_t is returned fcontext_t fc(make_fcontext(sp,size,f)); Calling jump_fcontext() invokes the context-function in a newly created context complete with registers, flags, stack and instruction pointers. When control should be returned to the original calling context, call jump_fcontext(). The current context information (registers, flags, and stack and instruction pointers) is saved and the original context information is restored. Calling jump_fcontext() again resumes execution in the second context after saving the new state of the original context. boost::context::fcontext_t fcm,fc1,fc2; void f1(intptr_t) { std::cout<<"f1: entered"<<std::endl; std::cout<<"f1: call jump_fcontext( & fc1, fc2, 0)"<< std::endl; boost::context::jump_fcontext(&fc1,fc2,0); std::cout<<"f1: return"<<std::endl; boost::context::jump_fcontext(&fc1,fcm,0); } void f2(intptr_t) { std::cout<<"f2: entered"<<std::endl; std::cout<<"f2: call jump_fcontext( & fc2, fc1, 0)"<<std::endl; boost::context::jump_fcontext(&fc2,fc1,0); BOOST_ASSERT(false&&!"f2: never returns"); } std::size_t size(8192); void* sp1(std::malloc(size)); void* sp2(std::malloc(size)); fc1=boost::context::make_fcontext(sp1,size,f1); fc2=boost::context::make_fcontext(sp2,size,f2); std::cout<<"main: call jump_fcontext( & fcm, fc1, 0)"<<std::endl; boost::context::jump_fcontext(&fcm,fc1,0); output: main: call jump_fcontext( & fcm, fc1, 0) f1: entered f1: call jump_fcontext( & fc1, fc2, 0) f2: entered f2: call jump_fcontext( & fc2, fc1, 0) f1: return First call of jump_fcontext() enters the context-function f1() by starting context fc1 (context fcm saves the registers of main()). For jumping between context's fc1 and fc2 jump_fcontext() is called. Because context fcm is chained to fc1, main() is entered (returning from jump_fcontext()) after context fc1 becomes complete (return from f1()). Calling jump_fcontext() to the same context from inside the same context results in undefined behaviour. The size of the stack is required to be larger than the size of fcontext_t. In contrast to threads, which are preemtive, fcontext_t switches are cooperative (programmer controls when switch will happen). The kernel is not involved in the context switches. Transfer of data The third argument passed to jump_fcontext(), in one context, is passed as the first argument of the context-function if the context is started for the first time. In all following invocations of jump_fcontext() the intptr_t passed to jump_fcontext(), in one context, is returned by jump_fcontext() in the other context. boost::context::fcontext_t fcm,fc; typedef std::pair<int,int> pair_t; void f(intptr_t param) { pair_t* p=(pair_t*)param; p=(pair_t*)boost::context::jump_fcontext(&fc,fcm,(intptr_t)(p->first+p->second)); boost::context::jump_fcontext(&fc,fcm,(intptr_t)(p->first+p->second)); } std::size_t size(8192); void* sp(std::malloc(size)); pair_t p(std::make_pair(2,7)); fc=boost::context::make_fcontext(sp,size,f); int res=(int)boost::context::jump_fcontext(&fcm,fc,(intptr_t)&p); std::cout<<p.first<<" + "<<p.second<<" == "<<res<<std::endl; p=std::make_pair(5,6); res=(int)boost::context::jump_fcontext(&fcm,fc,(intptr_t)&p); std::cout<<p.first<<" + "<<p.second<<" == "<<res<<std::endl; output: 2 + 7 == 9 5 + 6 == 11 Exceptions in context-function If the context-function emits an exception, the behaviour is undefined. context-function should wrap the code in a try/catch block. Do not jump from inside a catch block and then re-throw the exception in another execution context. Preserving floating point registers Preserving the floating point registers increases the cycle count for a context switch (see performance tests). The fourth argument of jump_fcontext() controls if fpu registers should be preserved by the context jump. The use of the fpu controlling argument of jump_fcontext() must be consistent in the application. Otherwise the behaviour is undefined. Stack unwinding Sometimes it is necessary to unwind the stack of an unfinished context to destroy local stack variables so they can release allocated resources (RAII pattern). The user is responsible for this task. fcontext_t and related functions struct stack_t { void* sp; std::size_t size; }; typedef <opaque pointer > fcontext_t; intptr_t jump_fcontext(fcontext_t* ofc,fcontext_t nfc,intptr_t vp,bool preserve_fpu=true); fcontext_t make_fcontext(void* sp,std::size_t size,void(*fn)(intptr_t)); sp Member: Pointer to the beginning of the stack (depending of the architecture the stack grows downwards or upwards). size Member: Size of the stack in bytes. fc_stack Member: Tracks the memory for the context's stack. intptr_t jump_fcontext(fcontext_t* ofc,fcontext_t nfc,intptr_t p,bool preserve_fpu=true) Effects: Stores the current context data (stack pointer, instruction pointer, and CPU registers) to *ofc and restores the context data from nfc, which implies jumping to nfc's execution context. The intptr_t argument, p, is passed to the current context to be returned by the most recent call to jump_fcontext() in the same thread. The last argument controls if fpu registers have to be preserved. Returns: The third pointer argument passed to the most recent call to jump_fcontext(), if any. fcontext_t make_fcontext(void* sp,std::size_t size,void(*fn)(intptr_t)) Precondition: Stack sp and function pointer fn are valid (depending on the architecture sp points to the top or bottom of the stack) and size > 0. Effects: Creates an fcontext_t on top of the stack and prepares the stack to execute the context-function fn. Returns: Returns a fcontext_t which is placed on the stack.
<link linkend="context.econtext">Class execution_context</link> execution_context requires C++14. Class execution_context encapsulates fcontext_t and related functions ( jump_fcontext() and make_fcontext()) as well as stack management. execution_context permits access to the current, active context via execution_context::current(). /* * grammar: * P ---> E '\0' * E ---> T {('+'|'-') T} * T ---> S {('*'|'/') S} * S ---> digit | '(' E ')' */ class Parser{ // implementation omitted; see examples directory }; int main() { std::istringstream is("1+1"); char c; bool done=false; std::exception_ptr except; // create handle to main execution context auto main_ctx( boost::context::execution_context::current() ); // execute parser in new execution context boost::context::execution_context parser_ctx( boost::context::fixedsize_stack(), [&main_ctx,&is,&c,&done,&except](){ // create parser with callback function Parser p( is, [&main_ctx,&c](char ch){ c=ch; // resume main execution context main_ctx(); }); try { // start recursive parsing p.run(); } catch ( ... ) { // store other exceptions in exception-pointer except = std::current_exception(); } // set termination flag done=true; // resume main execution context main_ctx(); }); // user-code pulls parsed data from parser // invert control flow parser_ctx(); if ( except) { std::rethrow_exception( except); } while( ! done) { printf("Parsed: %c\n",c); parser_ctx(); if ( except) { std::rethrow_exception( except); } } std::cout << "main: done" << std::endl; } output: Parsed: 1 Parsed: + Parsed: 1 In this example a recursive descent parser uses a callback to emit a newly passed symbol. Using execution_context the control flow can be inverted, e.g. the user-code pulls parsed symbols from the parser - instead to get pushed from the parser (via callback). The interface of execution_context does not transfer data. This is not required because usually sharing data's address (pointer/reference of lvalues) is sufficient. If the code executed by execution_context emits an exception, the applciation is termianted. std::exception_ptr can be used to transfer exceptions between different execution contexts. Sometimes it is necessary to unwind the stack of an unfinished context to destroy local stack variables so they can release allocated resources (RAII pattern). The user is responsible for this task. allocating control strutures on top of stack Allocating control structures on top of the stack requires to allocated the stack_context and create the control structure with placement new before execution_context is created. The user is responsible for destructing the control structure at the top of the stack. // stack-alloctor used for (de-)allocating stack fixedsize_stack salloc( 4048); // allocate stack space stack_context sctx( salloc.allocate() ); // reserve space for control structure on top of the stack void * sp = static_cast< char * >( sctx.sp) - sizeof( my_control_structure); std::size_t size = sctx.size - sizeof( my_control_structure); // placement new creates control structure on reserved space my_control_structure * cs = new ( sp) my_control_structure( sp, size, sctx, salloc); ... // destructing the control structure cs->~my_control_structure(); ... struct my_control_structure { // execution context execution_context ectx; template< typename StackAllocator > my_control_structure( void * sp, std::size_t size, stack_context sctx, StackAllocator salloc) : // create execution context ectx( preallocated( sp, size, sctx), salloc, entry_func) { } ... }; exception handling If the function executed inside a execution_context emitts ans exception, the application is terminated by calling ['std::terminate(). std::exception_ptr can be used to transfer exceptions between different execution contexts. parameter passing Input and output parameters are transfered via a lambda capture list and references/pointers. class X { private: int * inp_; std::string outp_; std::exception_ptr excptr_; boost::context::execution_context caller_; boost::context::execution_context callee_; public: X() : inp_( nullptr), outp_(), excptr_(), caller_( boost::context::execution_context::current() ), callee_( boost::context::fixedsize_stack(), [=] () { try { int i = * inp_; outp_ = boost::lexical_cast< std::string >( i); caller_(); } catch (...) { excptr_=std::current_exception(); } }) {} std::string operator()( int i) { inp_ = & i; callee_(); if(excptr_){ std::rethrow_exception(excptr_); } return outp_; } }; int main() { X x; std::cout << x( 7) << std::endl; std::cout << "done" << std::endl; } Class execution_context class execution_context { public: static execution_context current() noexcept; template< typename StackAlloc, typename Fn > execution_context( StackAlloc salloc, Fn && fn); template< typename StackAlloc, typename Fn, typename ... Args > execution_context( StackAlloc salloc, Fn && fn, Args && ... args); template< typename StackAlloc, typename Fn > execution_context( preallocated palloc, StackAlloc salloc, Fn && fn); template< typename StackAlloc, typename Fn, typename ... Args > execution_context( preallocated palloc, StackAlloc salloc, Fn && fn, Args && ... args); void operator()() noexcept; }; static execution_context current() Returns: Returns an instance of excution_context pointing to the active execution context. Throws: Nothing. template< typename StackAlloc, typname Fn > execution_context( StackAlloc salloc, Fn && fn) Effects: Creates a new execution context and prepares the context to execute fn. template< typename StackAlloc, typname Fn, typename ... Args > execution_context( StackAlloc salloc, Fn && fn, Args && ... args) Effects: Creates a new execution context and prepares the context to execute fn. template< typename StackAlloc, typname Fn > execution_context( preallocated palloc, StackAlloc salloc, Fn && fn) Effects: Creates a new execution context and prepares the context to execute fn. Used to store control structures on top of the stack. template< typename StackAlloc, typname Fn, typename ... Args > execution_context( preallocated palloc, StackAlloc salloc, Fn && fn, Args && ... args) Effects: Creates a new execution context and prepares the context to execute fn. Used to store control structures on top of the stack. void operator()() noexcept Effects: Stores internally the current context data (stack pointer, instruction pointer, and CPU registers) to the current active context and restores the context data from *this, which implies jumping to *this's execution context. Note: The behaviour is undefined if operator()() is called while execution_context::current() returns *this (e.g. resuming an alredy running cotnext). If the top-level context function returns, std::exit() is called. Returns: Reference to *this. Throws: Nothing. Struct preallocated struct preallocated { void * sp; std::size_t size; stack_context sctx; preallocated( void * sp, std:size_t size, stack_allocator sctx) noexcept; }; preallocated( void * sp, std:size_t size, stack_allocator sctx) Effects: Crreates an object of preallocated.
<link linkend="context.econtext.winfibers">Using WinFiber-API</link> Because the TIB (thread information block) is not fully described in the MSDN it might be possible that not all required TIB-parts are swapped. With compiler flag BOOST_USE_WINFIBERS execution_context uses internally Windows Fiber API.
<link linkend="context.stack">Stack allocation</link> The memory used by the stack is allocated/deallocated via a StackAllocator which is required to model a stack-allocator concept. stack-allocator concept A StackAllocator must satisfy the stack-allocator concept requirements shown in the following table, in which a is an object of a StackAllocator type, sctx is a stack_context, and size is a std::size_t: expression return type notes a(size) creates a stack allocator a.allocate() stack_context creates a stack a.deallocate( sctx) void deallocates the stack created by a.allocate() The implementation of allocate() might include logic to protect against exceeding the context's available stack size rather than leaving it as undefined behaviour. Calling deallocate() with a stack_context not set by allocate() results in undefined behaviour. The stack is not required to be aligned; alignment takes place inside execution_context. Depending on the architecture allocate() stores an address from the top of the stack (growing downwards) or the bottom of the stack (growing upwards).
<link linkend="context.stack.protected_fixedsize">Class <emphasis>protected_fixedsize</emphasis></link> Boost.Context provides the class protected_fixedsize_stack which models the stack-allocator concept. It appends a guard page at the end of each stack to protect against exceeding the stack. If the guard page is accessed (read or write operation) a segmentation fault/access violation is generated by the operating system. Using protected_fixedsize_stack is expensive. That is, launching a new coroutine with a new stack is expensive; the allocated stack is just as efficient to use as any other stack. The appended guard page is not mapped to physical memory, only virtual addresses are used. #include <boost/context/protected_fixedsize.hpp> template< typename traitsT > struct basic_protected_fixedsize { typedef traitT traits_type; basic_protected_fixesize(std::size_t size = traits_type::default_size()); stack_context allocate(); void deallocate( stack_context &); } typedef basic_protected_fixedsize< stack_traits > protected_fixedsize stack_context allocate() Preconditions: traits_type::minimum:size() <= size and ! traits_type::is_unbounded() && ( traits_type::maximum:size() >= size). Effects: Allocates memory of at least size Bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture (the stack grows downwards/upwards) the stored address is the highest/lowest address of the stack. void deallocate( stack_context & sctx) Preconditions: sctx.sp is valid, traits_type::minimum:size() <= sctx.size and ! traits_type::is_unbounded() && ( traits_type::maximum:size() >= sctx.size). Effects: Deallocates the stack space.
<link linkend="context.stack.fixedsize">Class <emphasis>fixedsize_stack</emphasis></link> Boost.Context provides the class fixedsize_stack which models the stack-allocator concept. In contrast to protected_fixedsize_stack it does not append a guard page at the end of each stack. The memory is simply managed by std::malloc() and std::free(). #include <boost/context/fixedsize_stack.hpp> template< typename traitsT > struct basic_fixedsize_stack { typedef traitT traits_type; basic_fixesize_stack(std::size_t size = traits_type::default_size()); stack_context allocate(); void deallocate( stack_context &); } typedef basic_fixedsize_stack< stack_traits > fixedsize_stack; stack_context allocate() Preconditions: traits_type::minimum:size() <= size and ! traits_type::is_unbounded() && ( traits_type::maximum:size() >= size). Effects: Allocates memory of at least size Bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture (the stack grows downwards/upwards) the stored address is the highest/lowest address of the stack. void deallocate( stack_context & sctx) Preconditions: sctx.sp is valid, traits_type::minimum:size() <= sctx.size and ! traits_type::is_unbounded() && ( traits_type::maximum:size() >= sctx.size). Effects: Deallocates the stack space.
<link linkend="context.stack.segmented">Class <emphasis>segmented_stack</emphasis></link> Boost.Context supports usage of a segmented_stack, e. g. the size of the stack grows on demand. The coroutine is created with a minimal stack size and will be increased as required. Class segmented_stack models the stack-allocator concept. In contrast to protected_fixedsize_stack and fixedsize_stack it creates a stack which grows on demand. Segmented stacks are currently only supported by gcc from version 4.7 clang from version 3.4 onwards. In order to use a __segmented_stack__ Boost.Context must be built with toolset=gcc segmented-stacks=on at b2/bjam command-line. Applications must be compiled with compiler-flags -fsplit-stack -DBOOST_USE_SEGMENTED_STACKS. #include <boost/context/segmented_stack.hpp> template< typename traitsT > struct basic_segmented_stack { typedef traitT traits_type; basic_segmented_stack(std::size_t size = traits_type::default_size()); stack_context allocate(); void deallocate( stack_context &); } typedef basic_segmented_stack< stack_traits > segmented_stack; stack_context allocate() Preconditions: traits_type::minimum:size() <= size and ! traits_type::is_unbounded() && ( traits_type::maximum:size() >= size). Effects: Allocates memory of at least size Bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture (the stack grows downwards/upwards) the stored address is the highest/lowest address of the stack. void deallocate( stack_context & sctx) Preconditions: sctx.sp is valid, traits_type::minimum:size() <= sctx.size and ! traits_type::is_unbounded() && ( traits_type::maximum:size() >= sctx.size). Effects: Deallocates the stack space.
<link linkend="context.stack.stack_traits">Class <emphasis>stack_traits</emphasis></link> stack_traits models a stack-traits providing a way to access certain properites defined by the enironment. Stack allocators use stack-traits to allocate stacks. #include <boost/context/stack_traits.hpp> struct stack_traits { static bool is_unbounded() noexcept; static std::size_t page_size() noexcept; static std::size_t default_size() noexcept; static std::size_t minimum_size() noexcept; static std::size_t maximum_size() noexcept; } static bool is_unbounded() Returns: Returns true if the environment defines no limit for the size of a stack. Throws: Nothing. static std::size_t page_size() Returns: Returns the page size in bytes. Throws: Nothing. static std::size_t default_size() Returns: Returns a default stack size, which may be platform specific. If the stack is unbounded then the present implementation returns the maximum of 64 kB and minimum_size(). Throws: Nothing. static std::size_t minimum_size() Returns: Returns the minimum size in bytes of stack defined by the environment (Win32 4kB/Win64 8kB, defined by rlimit on POSIX). Throws: Nothing. static std::size_t maximum_size() Preconditions: is_unbounded() returns false. Returns: Returns the maximum size in bytes of stack defined by the environment. Throws: Nothing.
<link linkend="context.stack.stack_context">Class <emphasis>stack_context</emphasis></link> Boost.Context provides the class stack_context which will contain the stack pointer and the size of the stack. In case of a segmented_stack, stack_context contains some extra control structures. struct stack_context { void * sp; std::size_t size; // might contain additional control structures // for segmented stacks } void * sp Value: Pointer to the beginning of the stack. std::size_t size Value: Actual size of the stack.
<link linkend="context.stack.valgrind">Support for valgrind</link> Running programs that switch stacks under valgrind causes problems. Property (b2 command-line) valgrind=on let valgrind treat the memory regions as stack space which suppresses the errors.
<link linkend="context.performance">Performance</link> Performance of Boost.Context was measured on the platforms shown in the following table. Performance measurements were taken using rdtsc and boost::chrono::high_resolution_clock, with overhead corrections, on x86 platforms. In each case, cache warm-up was accounted for, and the one running thread was pinned to a single CPU. The code was compiled using the build options, 'variant = release cxxflags = -DBOOST_DISABLE_ASSERTS'. Performance of context switch Platform ucontext_t fcontext_t execution_context windows fibers i386 AMD Athlon 64 DualCore 4400+ 708 ns / 754 cycles 37 ns / 37 cycles ns / cycles ns / cycles x86_64 Intel Core2 Q6700 547 ns / 1433 cycles 8 ns / 23 cycles 16 ns / 46 cycles ns / cycles
<link linkend="context.architectures">Architectures</link> Boost.Context supports following architectures: Supported architectures (<ABI|binary format>) Architecture LINUX (UNIX) Windows MacOS X iOS arm AAPCS|ELF AAPCS|PE - AAPCS|MACH-O i386 SYSV|ELF MS|PE SYSV|MACH-O - mips1 O32|ELF - - - ppc32 SYSV|ELF,XCOFF - SYSV|MACH-O - ppc64 SYSV|ELF,XCOFF - SYSV|MACH-O - sparc - - - - x86_64 SYSV,X32|ELF MS|PE SYSV|MACH-O -
<link linkend="context.rationale">Rationale</link> No inline-assembler Some newer compiler (for instance MSVC 10 for x86_64 and itanium) do not support inline assembler. MSDN article 'Inline Assembler' . Inlined assembler generates code bloating which is not welcome on embedded systems. fcontext_t Boost.Context provides the low level API fcontext_t which is implemented in assembler to provide context swapping operations. fcontext_t is the part to port to new platforms. Context switches do not preserve the signal mask on UNIX systems. fcontext_t is an opaque pointer.
<link linkend="context.rationale.other_apis_">Other APIs </link> setjmp()/longjmp() C99 defines setjmp()/longjmp() to provide non-local jumps but it does not require that longjmp() preserves the current stack frame. Therefore, jumping into a function which was exited via a call to longjmp() is undefined ISO/IEC 9899:1999, 2005, 7.13.2.1:2 . ucontext_t Since POSIX.1-2003 ucontext_t is deprecated and was removed in POSIX.1-2008! The function signature of makecontext() is: void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...); The third argument of makecontext() specifies the number of integer arguments that follow which will require function pointer cast if func will accept those arguments which is undefined in C99 ISO/IEC 9899:1999, 2005, J.2 . The arguments in the var-arg list are required to be integers, passing pointers in var-arg list is not guaranteed to work, especially it will fail for architectures where pointers are larger than integers. ucontext_t preserves signal mask between context switches which involves system calls consuming a lot of CPU cycles (ucontext_t is slower by perfomance_link[factor 13x] relative to fcontext_t). Windows fibers A drawback of Windows Fiber API is that CreateFiber() does not accept a pointer to user allocated stack space preventing the reuse of stacks for other context instances. Because the Windows Fiber API requires to call ConvertThreadToFiber() if SwitchFiber() is called for a thread which has not been converted to a fiber. For the same reason ConvertFiberToThread() must be called after return from SwitchFiber() if the thread was forced to be converted to a fiber before (which is inefficient). if ( ! is_a_fiber() ) { ConvertThreadToFiber( 0); SwitchToFiber( ctx); ConvertFiberToThread(); } If the condition _WIN32_WINNT >= _WIN32_WINNT_VISTA is met function IsThreadAFiber() is provided in order to detect if the current thread was already converted. Unfortunately Windows XP + SP 2/3 defines _WIN32_WINNT >= _WIN32_WINNT_VISTA without providing IsThreadAFiber().
<link linkend="context.rationale.x86_and_floating_point_env">x86 and floating-point env</link> i386 "The FpCsr and the MxCsr register must be saved and restored before any call or return by any procedure that needs to modify them ..." 'Calling Conventions', Agner Fog . x86_64 Windows MxCsr - "A callee that modifies any of the non-volatile fields within MxCsr must restore them before returning to its caller. Furthermore, a caller that has modified any of these fields must restore them to their standard values before invoking a callee ..." MSDN article 'MxCsr' . FpCsr - "A callee that modifies any of the fields within FpCsr must restore them before returning to its caller. Furthermore, a caller that has modified any of these fields must restore them to their standard values before invoking a callee ..." MSDN article 'FpCsr' . "The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across context switches. There is no explicit calling convention for these registers." MSDN article 'Legacy Floating-Point Support' . "The 64-bit Microsoft compiler does not use ST(0)-ST(7)/MM0-MM7". 'Calling Conventions', Agner Fog . "XMM6-XMM15 must be preserved" MSDN article 'Register Usage' SysV "The control bits of the MxCsr register are callee-saved (preserved across calls), while the status bits are caller-saved (not preserved). The x87 status word register is caller-saved, whereas the x87 control word (FpCsr) is callee-saved." SysV ABI AMD64 Architecture Processor Supplement Draft Version 0.99.4, 3.2.1 .
<link linkend="context.reference">Reference</link> ARM AAPCS ABI: Procedure Call Standard for the ARM Architecture AAPCS/LINUX: ARM GNU/Linux Application Binary Interface Supplement MIPS O32 ABI: SYSTEM V APPLICATION BINARY INTERFACE, MIPS RISC Processor Supplement PowerPC32 SYSV ABI: SYSTEM V APPLICATION BINARY INTERFACE PowerPC Processor Supplement PowerPC64 SYSV ABI: PowerPC User Instruction Set Architecture, Book I X86-32 SYSV ABI: SYSTEM V APPLICATION BINARY INTERFACE, Intel386TM Architecture Processor Supplement MS PE: Calling Conventions X86-64 SYSV ABI: System V Application Binary Interface, AMD64 Architecture Processor Supplement MS PE: x64 Software Conventions
<link linkend="context.acknowledgements">Acknowledgments</link> I'd like to thank Adreas Fett, Artyom Beilis, Daniel Larimer, David Deakins, Evgeny Shapovalov, Fernando Pelliccioni, Giovanni Piero Deretta, Gordon Woodhull, Helge Bahmann, Holger Grund, Jeffrey Lee Hellrung (Jr.), Keith Jeffery, Martin Husemann, Phil Endecott, Robert Stewart, Sergey Cheban, Steven Watanabe, Vicente J. Botet Escriba, Wayne Piekarski.