diff --git a/boost_1_63_0/libs/fiber/doc/html/fiber/performance.html b/boost_1_63_0/libs/fiber/doc/html/fiber/performance.html new file mode 100644 index 0000000..9ef5e63 --- /dev/null +++ b/boost_1_63_0/libs/fiber/doc/html/fiber/performance.html @@ -0,0 +1,262 @@ + +
+ +![]() |
+Home | +Libraries | +People | +FAQ | +More | +
+ Performance measurements were taken using std::chrono::highresolution_clock,
+ with overhead corrections. The code was compiled with gcc-6.20, using build
+ options: variant = release, optimization = speed. Tests were executed on Intel
+ Core i7-4770S 3.10GHz, 4 cores with 8 hyperthreads (4C/8T), running Linux (4.7.0/x86_64).
+
+ Measurements headed 1C/1T were run in a single-threaded process. +
++ The microbenchmark syknet + from Alexander Temerev was ported and used for performance measurements. At + the root the test spawns 10 threads-of-execution (ToE), e.g. actor/goroutine/fiber + etc.. Each spawned ToE spawns additional 10 ToEs ... until 100000 ToEs are + created. ToEs return back ther ordinal numbers (0 ... 99999), which are summed + on the previous level and sent back upstream, until reaching the root. The + test was run 10-20 times, producing a range of values for each measurement. +
+Table 1.1. time per actor/erlang process/goroutine (other languages) (average over + 100,000)
+|
+ + Haskell | stack-1.0.4 + + |
+
+ + Erlang | erts-7.0 + + |
+
+ + Go | go1.6.1 (GOMAXPROCS == default) + + |
+
+ + Go | go1.6.1 (GOMAXPROCS == 8) + + |
+
|---|---|---|---|
|
+ + 0.32 µs + + |
+
+ + 0.64 µs - 1.21 µs + + |
+
+ + 1.52 µs - 1.64 µs + + |
+
+ + 0.70 µs - 0.98 µs + + |
+
+ std::thread can not be tested at this time (C++14)
+ because the API does not allow to set thread stack size (idefault on Linux
+ 2MB, on Windows 1MB). An out-of-memory error is likely. With pthreads the stack
+ size is set 8kBC.
+
Table 1.2. time per thread (average over *10,000* - unable to spawn 100,000 threads)
+|
+ + pthread + + |
+
+
+ |
+
|---|---|
|
+ + 14.4 µs - 20.8 µs + + |
+
+ + 18.8 µs - 28.1 µs + + |
+
+ The test utilizes 4 cores with Symmetric MultiThreading enabled (8 logical
+ CPUs). The fiber stacks are allocated by fixedsize_stack.
+
+ As the benchmark shows, the memory allocation algorithm is significant for + performance in a multithreaded environment. The tests use glibc’s memory allocation + algorithm (based on ptmalloc2) as well as Google’s TCmalloc + (via linkflags="-ltcmalloc").[9] +
+
+ The shared_work scheduling algorithm uses one global queue,
+ containing fibers ready to run, shared between all threads. The work is distributed
+ equally over all threads. In the work_stealing scheduling
+ algorithm, each thread has its own local queue. Fibers that are ready to run
+ are pushed to and popped from the local queue. If the queue runs out of ready
+ fibers, fibers are stolen from the local queues of other participating threads.
+
Table 1.3. time per fiber (average over 100,000)
+|
+ + fiber (4C/8T, work stealing, tcmalloc) + + |
+
+ + fiber (4C/8T, work stealing) + + |
+
+ + fiber (4C/8T, work sharing, tcmalloc) + + |
+
+ + fiber (4C/8T, work sharing) + + |
+
+ + fiber (1C/1T, round robin, tcmalloc) + + |
+
+ + fiber (1C/1T, round robin) + + |
+
|---|---|---|---|---|---|
|
+ + 0.13 µs - 0.26 µs + + |
+
+ + 0.35 µs - 0.66 µs + + |
+
+ + 0.62 µs - 0.80 µs + + |
+
+ + 0.90 µs - 1.11 µs + + |
+
+ + 0.90 µs - 1.03 µs + + |
+
+ + 0.91 µs - 1.28 µs + + |
+
[9] + Tais B. Ferreira, Rivalino Matias, Autran Macedo, Lucio B. Araujo “An + Experimental Study on Memory Allocators in Multicore and Multithreaded Applications”, + PDCAT ’11 Proceedings of the 2011 12th International Conference on Parallel + and Distributed Computing, Applications and Technologies, pages 92-98 +
| + | + |
![]() |
+Home | +Libraries | +People | +FAQ | +More | +
+ A fiber uses internally an execution_context
+ which manages a set of registers and a stack. The memory used by the stack
+ is allocated/deallocated via a stack_allocator which is
+ required to model a stack-allocator
+ concept.
+
+ A stack_allocator can be passed to fiber::fiber() or to fibers::async().
+
+ A stack_allocator must satisfy the stack-allocator
+ concept requirements shown in the following table, in which a is an object of a stack_allocator
+ type, sctx is a stack_context, and size is a std::size_t:
+
|
+ + expression + + |
+
+ + return type + + |
+
+ + notes + + |
+
|---|---|---|
|
+
+ |
++ | +
+ + creates a stack allocator + + |
+
|
+
+ |
+
+
+ |
+
+ + creates a stack + + |
+
|
+
+ |
+
+
+ |
+
+
+ deallocates the stack created by |
+
![]() |
+Important | +
|---|---|
+ The implementation of |
![]() |
+Important | +
|---|---|
+ Calling |
![]() |
+Note | +
|---|---|
+ The memory for the stack is not required to be aligned; alignment takes place + inside execution_context. + |
+ See also Boost.Context
+ stack allocation. In particular, traits_type
+ methods are as described for boost::context::stack_traits.
+
+
+protected_fixedsize_stack
++
+
+ Boost.Fiber provides the class protected_fixedsize_stack which
+ models the stack-allocator
+ concept. It appends a guard page at the end of each stack
+ to protect against exceeding the stack. If the guard page is accessed (read
+ or write operation) a segmentation fault/access violation is generated by the
+ operating system.
+
![]() |
+Important | +
|---|---|
+ Using |
![]() |
+Note | +
|---|---|
+ The appended |
#include <boost/fiber/protected_fixedsize.hpp> + +namespace boost { +namespace fibers { + +struct protected_fixedsize { + protected_fixesize(std::size_t size = traits_type::default_size()); + + stack_context allocate(); + + void deallocate( stack_context &); +} + +}} ++
+
+allocate()
++
+stack_context allocate(); ++
+
+ traits_type::minimum_size()
+ <= size
+ and traits_type::is_unbounded()
+ || (
+ size <=
+ traits_type::maximum_size()
+ ).
+
+ Allocates memory of at least size
+ bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture
+ (the stack grows downwards/upwards) the stored address is the highest/lowest
+ address of the stack.
+
+
+deallocate()
++
+void deallocate( stack_context & sctx); ++
+
+ sctx.sp is valid, traits_type::minimum_size() <= sctx.size and traits_type::is_unbounded() || ( sctx.size <= traits_type::maximum_size() ).
+
+ Deallocates the stack space. +
+
+pooled_fixedsize_stack
++
+
+ Boost.Fiber provides the class pooled_fixedsize_stack which
+ models the stack-allocator
+ concept. In contrast to protected_fixedsize_stack it
+ does not append a guard page at the end of each stack. The memory is managed
+ internally by boost::pool<>.
+
#include <boost/fiber/pooled_fixedsize_stack.hpp> + +namespace boost { +namespace fibers { + +struct pooled_fixedsize_stack { + pooled_fixedsize_stack(std::size_t stack_size = traits_type::default_size(), std::size_t next_size = 32, std::size_t max_size = 0); + + stack_context allocate(); + + void deallocate( stack_context &); +} + +}} ++
+
++
+pooled_fixedsize_stack(std::size_t stack_size, std::size_t next_size, std::size_t max_size); ++
+
+ traits_type::is_unbounded()
+ || (
+ traits_type::maximum_size()
+ >= stack_size) and 0
+ < next_size.
+
+ Allocates memory of at least stack_size
+ bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture
+ (the stack grows downwards/upwards) the stored address is the highest/lowest
+ address of the stack. Argument next_size
+ determines the number of stacks to request from the system the first
+ time that *this
+ needs to allocate system memory. The third argument max_size
+ controls how much memory might be allocated for stacks — a value of zero
+ means no upper limit.
+
+
+allocate()
++
+stack_context allocate(); ++
+
+ traits_type::is_unbounded()
+ || (
+ traits_type::maximum_size()
+ >= stack_size).
+
+ Allocates memory of at least stack_size
+ bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture
+ (the stack grows downwards/upwards) the stored address is the highest/lowest
+ address of the stack.
+
+
+deallocate()
++
+void deallocate( stack_context & sctx); ++
+
+ sctx.sp is valid, traits_type::is_unbounded() || ( traits_type::maximum_size() >= sctx.size).
+
+ Deallocates the stack space. +
![]() |
+Note | +
|---|---|
+ This stack allocator is not thread safe. + |
+
+fixedsize_stack
++
+
+ Boost.Fiber provides the class fixedsize_stack which
+ models the stack-allocator
+ concept. In contrast to protected_fixedsize_stack it
+ does not append a guard page at the end of each stack. The memory is simply
+ managed by std::malloc()
+ and std::free().
+
#include <boost/context/fixedsize_stack.hpp> + +namespace boost { +namespace fibers { + +struct fixedsize_stack { + fixedsize_stack(std::size_t size = traits_type::default_size()); + + stack_context allocate(); + + void deallocate( stack_context &); +} + +}} ++
+
+allocate()
++
+stack_context allocate(); ++
+
+ traits_type::minimum_size()
+ <= size
+ and traits_type::is_unbounded()
+ || (
+ traits_type::maximum_size()
+ >= size).
+
+ Allocates memory of at least size
+ bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture
+ (the stack grows downwards/upwards) the stored address is the highest/lowest
+ address of the stack.
+
+
+deallocate()
++
+void deallocate( stack_context & sctx); ++
+
+ sctx.sp is valid, traits_type::minimum_size() <= sctx.size and traits_type::is_unbounded() || ( traits_type::maximum_size() >= sctx.size).
+
+ Deallocates the stack space. +
+
+segmented_stack
++
+
+ Boost.Fiber supports usage of a segmented_stack,
+ i.e. the stack grows on demand. The fiber is created with a minimal stack size
+ which will be increased as required. Class segmented_stack models
+ the stack-allocator concept.
+ In contrast to protected_fixedsize_stack and
+ fixedsize_stack it creates a stack which grows on demand.
+
![]() |
+Note | +
|---|---|
+ Segmented stacks are currently only supported by gcc
+ from version 4.7 and clang
+ from version 3.4 onwards. In order to use
+ a |
#include <boost/fiber/segmented_stack.hpp> + +namespace boost { +namespace fibers { + +struct segmented_stack { + segmented_stack(std::size_t stack_size = traits_type::default_size()); + + stack_context allocate(); + + void deallocate( stack_context &); +} + +}} ++
+
+allocate()
++
+stack_context allocate(); ++
+
+ traits_type::minimum_size()
+ <= size
+ and traits_type::is_unbounded()
+ || (
+ traits_type::maximum_size()
+ >= size).
+
+ Allocates memory of at least size
+ bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture
+ (the stack grows downwards/upwards) the stored address is the highest/lowest
+ address of the stack.
+
+
+deallocate()
++
+void deallocate( stack_context & sctx); ++
+
+ sctx.sp is valid, traits_type::minimum_size() <= sctx.size and traits_type::is_unbounded() || ( traits_type::maximum_size() >= sctx.size).
+
+ Deallocates the stack space. +
![]() |
+Note | +
|---|---|
+ If the library is compiled for segmented stacks, |
| + | + |
![]() |
+Home | +Libraries | +People | +FAQ | +More | +
+ Running programs that switch stacks under valgrind causes problems. Property
+ (b2 command-line) valgrind=on let
+ valgrind treat the memory regions as stack space which suppresses the errors.
+
| + | + |