The backend implements core and extra atomic operations using gcc asm blocks.
The implementation supports extensions added in ARMv8.1 and ARMv8.3. It supports
both little and big endian targets.
Currently, the code has not been tested on real hardware. It has been tested
on a QEMU VM.
ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition allows to use
ldrex and other load-exclusive instructions without a matching strex in
Section A3.4.5 "Load-Exclusive and Store-Exclusive usage restrictions". And in
Section A3.5.3 "Atomicity in the ARM architecture" it states that ldrexd
atomically loads the 64-bit value from suitably aligned memory. This makes
the strexd added in eb50aea437 unnecessary.
ARM Architecture Reference Manual Armv8, for Armv8-A architecture profile
does not state explicitly that ldrexd can be used without a matching strexd,
but does not prohibit this either in Section E2.10.5 "Load-Exclusive and
Store-Exclusive instruction usage restrictions".
Although we don't need to store anything after the load, we need to issue
strexd to reset the exclusive access mark on the storage address. So we
immediately store the loaded value back.
The technique to use ldrexd+strexd is described in ARM Architecture
Reference Manual ARMv8, Section B2.2.1. Although it is described for ARMv8,
the technique should be valid for previous versions as well.
Forced inline is mostly used to ensure the compiler is able to treat memory
order arguments as constants. It is also useful for constant propagation
on other arguments. This is not very useful for the emulated backend, so
we might as well allow the compiler to not inline the functions.
When the emulated wait function is inlined, the compiler sometimes generates
code that acts as if a wrong value is returned from the wait function. The
compiler simply "forgets" to save the atomic value into an object on the
stack, which makes it later use a bogus value as the "returned" value.
Preventing inlining seems to work around the problem.
Discovered by wait_api notify_one/notify_all test failures for struct_3_bytes.
Oddly enough, the same test for uint32_t did not fail.
mfence is more expensive on most recent CPUs than a lock-prefixed instruction
on a dummy location, while the latter is sufficient to implement sequential
consistency on x86. Some performance test results are available here:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
Also, for seq_cst stores in gcc_atomic backend, use an xchg instead of
mov+mfence, which are generated by gcc versions older than 10.1.
The machinery to detect mfence presence is still left intact just in case
if we need to use this instruction in the future.
Closes https://github.com/boostorg/atomic/issues/36.
The typedefs indicate the atomic object type for an unsigned/signed
integer that is lock-free and preferably has native support for waiting
and notifying operations.
The inter-process atomics have ipc_ prefixes: ipc_atomic, ipc_atomic_ref
and ipc_atomic_flag. These types are similar to their unprefixed counterparts
with the following distinctions:
- The operations are provided with an added precondition that is_lock_free()
returns true.
- All operations, including waiting/notifying operations, are address-free,
so the types are suitable for inter-process communication.
- The new has_native_wait_notify() operation and always_has_native_wait_notify
static constant allow to test if the target platform has native support for
address-free waiting/notifying operations. If it does not, a generic
implementation is used based on a busy wait.
- The new set of capability macros added. The macros are named
BOOST_ATOMIC_HAS_NATIVE_<T>_IPC_WAIT_NOTIFY and indicate whether address-free
waiting/notifying operations are supported natively for a given type.
Additionally, to unify interface and implementation of different components,
the has_native_wait_notify() operation and always_has_native_wait_notify
static constant were added to non-IPC atomic types as well. Added
BOOST_ATOMIC_HAS_NATIVE_<T>_WAIT_NOTIFY capability macros to indicate
native support for inter-thread waiting/notifying operations.
Also, added is_lock_free() and is_always_lock_free to atomic_flag.
This commit adds implementation, docs and tests.
Moved public classes definitions to the public headers and renamed
the internal implementation headers. This will allow to reuse the
implementation headers for inter-process atomics later.
Most platforms that support futexes or similar mechanisms support it
for 32-bit integers, which makes it more preferred to implement
atomic_flag efficiently. Most architectures also support 32-bit atomic
operations natively as well.
Also, reduced code duplication in instantiating operation backends.
The generic implementation is based on the lock pool. A list of condition
variables (or waiting futexes) is added per lock. Basically, the lock
pool serves as a global hash table, where each lock represents
a bucket and each wait state is an element. Every wait operation
allocates a wait state keyed on the pointer to the atomic object. Notify
operations look up the wait state by the atomic pointer and notify
the condition variable/futex. The corresponding lock needs to be acquired
to protect the wait state list during all wait/notify operations.
Backends not involving the lock pool are going to be added later.
The implementation of wait operation extends the C++20 definition in that
it returns the newly loaded value instead of void. This allows the caller
to avoid loading the value himself.
The waiting/notifying operations are not address-free. Address-free variants
will be added later.
Added tests for the new operations and refactored existing tests for atomic
operations. Added docs for the new operations.
bitwise_cast is more lightweight in terms of compile times and is equivalent
to integral_truncate in case of atomic_ref as its storage type is always
of the same size as the value type.
Due to BOOST_ATOMIC_DETAIL_ALIGNED_VAR_TPL macro expansion, the aligner
data member was made an array, which increased the size of the resulting
buffer_storage. This caused memory corruption with atomic_ref, which
requires the storage type to be of the same size as the value.
To protect against such mistakes in the future, changed
BOOST_ATOMIC_DETAIL_ALIGNED_VAR_TPL and BOOST_ATOMIC_DETAIL_ALIGNED_VAR
definitions to prohibit their direct use with arrays.
Increased lock pool size to 64 entries and improve pool efficiency:
- Shift off lower pointer bits that are zero due to object alignment.
- Mix higher pointer bits to account for alignment typically imposed by
malloc/new implementations.
- Use bit masking to select a lock from pool, given that the pool size
is a power of 2 now.
Also, extracted (u)intptr_t definition to a common header to avoid code
duplication.
This simplifies the code slightly without changing semantics. static_cast was
already used in atomic constructor in order to make it constexpr, and this
commit makes the rest of the code consistent.
The compiler allows to apply alignas but later fails to pass arguments
of the aligned types to functions with error C2719. At the same time,
std::max_align_t has alignment of 8 and the error doesn't show up when
the type is aligned using the union trick. Thus we disable alignas
for MSVC 14.0 in 32-bit mode.
Also, use std::max_align_t on MSVC, when possible.
gcc 4.7 does not support constexpr constructors that initialize one member
of an anonymous union data member of the class. atomic and atomic_flag
no longer have constexpr constructors on this compiler.
gcc older than 8.1 and clang older than 8.0 produce incorrect results of
std::alignment_of for 64-bit types on 32-bit x86. Use boost::alignment_of,
which contains workarounds for these compilers.
gcc 4.8 requires that the argument of alignas is a literal constant and does not
accept a constant expression.
This workaround is temporary, until Boost.Config updates BOOST_NO_CXX11_ALIGNAS:
https://github.com/boostorg/config/pull/324
On 32-bit x86, 64-bit integers have 4-byte alignment, which resulted in
a non-integral type being selected for storage type in case of lock-based
atomic_ref. This broke arithmetic and bitwise operations on atomic_ref.
We now select an integral type based on its native alignment, not the storage
alignment we require for atomic operations. This is fine in case of lock-based
backend.
Also, extended buffer_storage with support for specifying alignment and removed
aligned_buffer_storage and storage128_t.
Lock-based operations have no reason to require object alignment higher
than alignof(T). This commit implements a special storage type for
lock-based operations, which has the same alignment as value_type.
Also, added tests to verify required_alignment correctness both
in lock-free and lock-based cases.