This is in line with the C++20 change that requires the default constructor
of std::atomic to value initialize the atomic object.
Also, use is_nothrow_default_constructible to properly deduce noexceptness
of the default constructors of atomics.
This allows to mark bitwise_cast constexpr when the conditions to use
bit_cast are met:
- both source and target types have the same size, and
- the source type has no padding bits.
The latter is checked using has_unique_object_representations trait, which
we also implement using intrinsics when not available in the standard
library.
This allows atomic constructors to become constexpr for structs and floating
point types with no padding and whose size matches the atomic storage size.
Use __builtin_clear_padding and __builtin_zero_non_value_bits that were
introduced in gcc 11 and MSVC 19.27 to clear the padding bits in atomic
types. The intrinsics are used in bitwise_cast and atomic reference
constructors.
Also, separated atomic impl specializations for enums to allow using
static_cast to convert values to storage. This in turn allows to
relax compiler requirements to mark atomic constructors constexpr.
Updated docs and added tests for structs with padding and constexpr
atomic constructors.
This works around spurious test failures on Mac OS as the notifying operation
sometimes fails with ENOENT. Presumably, the OS sometimes invalidates the
internal identification of the stack memory region, which makes __ulock_wake
fail to find the ulock object that other threads are blocked on.
By using dynamic memory we (hopefully) are using a location in a normal mapped
memory region that should not be mangled by the OS. Ideally, we would use
process-shared memory for this test, but that makes it more difficult to make
it portable and runnable in parallel. Dynamic local memory should do for now.
This test verifies arithmetic and bitwise (logic) operations with
immediate constant arguments, which may affect instruction choice
in the operations. In particular, the test verifies that bug
https://github.com/boostorg/atomic/issues/41 is fixed.
This is the second iteration of the backends, which were both tested
on a QEMU VM and did not show any test failures. The essential difference
from the first version is that in AArch64 we now initialize the success flag
in the asm blocks in compare_exchange operations rather than relying on
compiler initializing it before passing into the asm block as an in-out
parameter. Apparently, this sometimes didn't work for some reason, which
made compare_exchange_strong return incorrect value, which broke futex-based
mutexes in the lock pool.
The above change was also applied to AArch32, along with minor corrections
in the asm blocks constraints.
We need to explicitly link with synchronization.lib when the
WaitOnAddress API is enabled at compile time for ARM targets. Since
this library is only available on newer Windows SDKs, we have to perform
a configure check for whether it is available.
The old operations template is replaced with core_operations, which falls
back to core_arch_operations, which falls back to core_operations_emulated.
The core_operations layer is intended for more or less architecture-neutral
backends, like the one based on gcc __atomic* intrinsics. It may fall back
to core_arch_operations where it is not supported by the compiler or where
the latter is more optimal. For example, where gcc does not implement 128-bit
atomic operations via __atomic* intrinsics, we support them in the
core_arch_operations backend, which uses inline assembler blocks.
The old emulated_operations template is largely unchanged and was renamed to
core_operations_emulated for naming consistency. All other operation templates
were also renamed for consistency (e.g. generic_wait_operations ->
wait_operations_generic).
Fence operations have been extracted to a separate set of structures:
fence_operations, fence_arch_opereations and fence_operations_emulated. These
are similar to the core operations described above. This structuring also
allows to fall back from fence_operations to fence_arch_opereations when
the latter is more optimal.
The net result of these changes is that 128-bit and 64-bit atomic operations
should now be consistently supported on all architectures that support them.
Previously, only x86 was supported via local hacks for gcc and clang.
The backend implements core and extra atomic operations using gcc asm blocks.
The implementation supports extensions added in ARMv8.1 and ARMv8.3. It supports
both little and big endian targets.
Currently, the code has not been tested on real hardware. It has been tested
on a QEMU VM.
The previous change to increase the delay didn't help, so we're instead
changing the expectation - the first woken thread is allowed to receive
value3 on wake up.
Occasionally, IPC notify_one test fails on Windows because the first
of the woken threads receives value3 from wait(). This is possible if
the thread lingers in wait() for some reason. Increase the delay
before the second notification slightly to reduce the likelihood
of this happening.
The implementation uses GetTickCount/GetTickCount64 internally,
which is a steady and sufficiently low precision time source.
We need the clock to have relatively low precision so that wait
tests don't fail spuriously because the blocked threads wake up
too soon, according to more precise clocks.
boost::chrono::system_clock currently has an acceptably low precision,
but it is not a steady clock.
Checking for the capability macros is not good enough because ipc_atomic_ref
can be not lock-free even when the macro (and ipc_atomic) indicates lock-free.
We now check the is_always_lockfree property to decide whether to run or skip
tests for a given IPC atomic type.
Also, made struct_3_bytes output more informative.
The inter-process atomics have ipc_ prefixes: ipc_atomic, ipc_atomic_ref
and ipc_atomic_flag. These types are similar to their unprefixed counterparts
with the following distinctions:
- The operations are provided with an added precondition that is_lock_free()
returns true.
- All operations, including waiting/notifying operations, are address-free,
so the types are suitable for inter-process communication.
- The new has_native_wait_notify() operation and always_has_native_wait_notify
static constant allow to test if the target platform has native support for
address-free waiting/notifying operations. If it does not, a generic
implementation is used based on a busy wait.
- The new set of capability macros added. The macros are named
BOOST_ATOMIC_HAS_NATIVE_<T>_IPC_WAIT_NOTIFY and indicate whether address-free
waiting/notifying operations are supported natively for a given type.
Additionally, to unify interface and implementation of different components,
the has_native_wait_notify() operation and always_has_native_wait_notify
static constant were added to non-IPC atomic types as well. Added
BOOST_ATOMIC_HAS_NATIVE_<T>_WAIT_NOTIFY capability macros to indicate
native support for inter-thread waiting/notifying operations.
Also, added is_lock_free() and is_always_lock_free to atomic_flag.
This commit adds implementation, docs and tests.
The user may define BOOST_ATOMIC_LOCK_POOL_SIZE_LOG2 macro to specify
binary logarithm of the size of the internal lock pool. The macro
only has effect when building Boost.Atomic.
The generic implementation is based on the lock pool. A list of condition
variables (or waiting futexes) is added per lock. Basically, the lock
pool serves as a global hash table, where each lock represents
a bucket and each wait state is an element. Every wait operation
allocates a wait state keyed on the pointer to the atomic object. Notify
operations look up the wait state by the atomic pointer and notify
the condition variable/futex. The corresponding lock needs to be acquired
to protect the wait state list during all wait/notify operations.
Backends not involving the lock pool are going to be added later.
The implementation of wait operation extends the C++20 definition in that
it returns the newly loaded value instead of void. This allows the caller
to avoid loading the value himself.
The waiting/notifying operations are not address-free. Address-free variants
will be added later.
Added tests for the new operations and refactored existing tests for atomic
operations. Added docs for the new operations.
gcc 4.7 does not support constexpr constructors that initialize one member
of an anonymous union data member of the class. atomic and atomic_flag
no longer have constexpr constructors on this compiler.
On 32-bit x86, 64-bit integers have 4-byte alignment, which resulted in
a non-integral type being selected for storage type in case of lock-based
atomic_ref. This broke arithmetic and bitwise operations on atomic_ref.
We now select an integral type based on its native alignment, not the storage
alignment we require for atomic operations. This is fine in case of lock-based
backend.
Also, extended buffer_storage with support for specifying alignment and removed
aligned_buffer_storage and storage128_t.
Lock-based operations have no reason to require object alignment higher
than alignof(T). This commit implements a special storage type for
lock-based operations, which has the same alignment as value_type.
Also, added tests to verify required_alignment correctness both
in lock-free and lock-based cases.
For some unknown reason, int128 bit operation tests started failing
in Travis CI for x86-64. The error does not reproduce locally. Given that
float128 were known to fail similarly (only on 32-bit target), it seems all
128-bit operations are broken in clang-5 (possibly caused by something
specific to Travis CI).
This commit re-enables 32-bit tests on clang-5, but disables int128 and
float128 tests on both 32 and 64-bit targets.
This works around MinGW compilation issues[1] and removes the otherwise
unnecessary dependency. Aligned storage is allocated manually on the stack.
[1]: https://github.com/boostorg/align/issues/10
The implementation used to generate 32-bit bts/btr/btc for 8 and 16-bit
atomics, which could result in alignment and access violation and possibly
data corruption. 32 and 64-bit atomics are unaffected.
This commit fixes operaend width for 16-bit atomics. For 8-bit atomics the
generic implementation is used based on or/and/xor instructions since there
are no 8-bit bts/btr/btc.