2
0
mirror of https://github.com/boostorg/atomic.git synced 2026-02-02 08:22:08 +00:00
Commit Graph

305 Commits

Author SHA1 Message Date
Julie Koubová
9462d89c8a Update core_arch_ops_msvc_arm.hpp
Fix the dependent name lookup
2021-04-01 17:45:48 +02:00
Andrey Semashev
0835f7cdf4 Corrected immediate constant constraints for bitwise ops in aarch64.
Bitwise (logical) instructions in aarch64 have different encoding of
the immediate constant arguments, which require a different asm
constraint. Otherwise, an invalid instruction could be generated,
resulting in a compilation error.

Fixes https://github.com/boostorg/atomic/issues/41.
2020-11-19 20:09:49 +03:00
Andrey Semashev
e873eb1596 Removed destination registers when they are the same as input in aarch32.
By default, the destination register is the same as the first input register
in most instructions. Removing it has no practical effect other than, perhaps,
inclining the assembler to generate a shorter encoding for the instruction
in Thumb mode.
2020-11-19 20:03:53 +03:00
Andrey Semashev
4aec6c3425 Added deduction of pointer size from UINTPTR_MAX. 2020-09-06 20:15:29 +03:00
Andrey Semashev
51c84b6d03 Added a missing include in ARM fence operations.
Fixes https://github.com/boostorg/atomic/issues/39.
2020-07-01 23:15:50 +03:00
Andrey Semashev
36827d2b29 Corrected opaque_complement compilation error for AArch32 and AArch64. 2020-06-25 16:04:47 +03:00
Andrey Semashev
da03b93349 Don't check for LSE in common extra ops in AArch64.
The check is already made in the size-specific specializations of the extra
operations, where needed. For 128-bit operations there are no LSE instructions,
which means the common template should still use the more efficient op
operations instead of fetch_op.
2020-06-25 15:52:40 +03:00
Andrey Semashev
46bc34778b Use op instead of fetch_op to implement opaque_op in AArch32 and AArch64.
The operations returning the result of the operation instead of the original
value require less registers, so are more efficient. The exception is AArch64
LSE extension, which adds dedicated atomic instructions which are presumably
more efficient than ll/sc loops.
2020-06-25 15:35:29 +03:00
Andrey Semashev
ac04b0f182 Replaced "+&" output constraints with "+" in the Alpha backend.
The "+&" seems unnecessary since the argument is used for both input and
output, so "early clobber" flag is meaningless in this case.
2020-06-24 22:00:53 +03:00
Andrey Semashev
53a00d5ca4 Replaced "+&" output constraints with "+" in the ARM backend.
The "+&" seems unnecessary since the argument is used for both input and
output, so "early clobber" flag is meaningless in this case.
2020-06-24 21:58:22 +03:00
Andrey Semashev
9e9d1f4398 Added AArch32 and AArch64 gcc asm-based backends.
This is the second iteration of the backends, which were both tested
on a QEMU VM and did not show any test failures. The essential difference
from the first version is that in AArch64 we now initialize the success flag
in the asm blocks in compare_exchange operations rather than relying on
compiler initializing it before passing into the asm block as an in-out
parameter. Apparently, this sometimes didn't work for some reason, which
made compare_exchange_strong return incorrect value, which broke futex-based
mutexes in the lock pool.

The above change was also applied to AArch32, along with minor corrections
in the asm blocks constraints.
2020-06-24 21:52:52 +03:00
Andrey Semashev
5ec2265754 Removed AArch32 and AArch64 gcc asm-based backends.
The backends will be moved to a separate branch for testing.
2020-06-24 14:41:33 +03:00
Andrey Semashev
fd6c3da53f Disabled AArch32 and AArch64 gcc asm-based backends.
During testing in a VM occasional fallback_wait_fuzz test failures were
observed when AArch64 asm-based backend was used to implement futexes in
the lock pool. It is not clear yet what causes the failures, but they
don't appear when __atomic* intrinsics are used. Further investigation needed.

AArch32 asm-based backend is practically untested and has very much in
common with AArch64, so I'm disabling it as well until at least the problem
with AArch64 is resolved.
2020-06-23 23:45:46 +00:00
Andrey Semashev
4bdf9cd84d Renamed extra ops templates for naming consistency. 2020-06-23 21:53:10 +03:00
Andrey Semashev
f8d2a58af0 Use hash prefix before constants in AArch64 asm blocks. 2020-06-23 20:56:01 +03:00
Andrey Semashev
840e7a17ca Optimized ARM asm blocks.
- Use dedicated registers to return success from compare_exchange methods.
- Pre-initialize success flag outside asm blocks.
- Use "+Q" constraints for memory operands in 64-bit operations. This allows
  to remove "memory" clobber.
- Avoid using lots of conditional instructions in 64-bit compare_exchange
  operations. Simplify success flag derivation.
2020-06-23 20:43:10 +03:00
Andrey Semashev
c59268b513 Added support for big-endian targets in gcc asm-based backend for ARM.
The legacy ARM backend used to assume little endian byte order, which is
significant in 64-bit atomic operations that involve addition or subtraction.
2020-06-23 19:46:31 +03:00
Andrey Semashev
ac32aad65f Added gcc asm-based backend for AArch32.
ARMv8 (AArch32) is significantly different from ARMv7, which warrants
addition of a separate asm-based backend:

- It adds exclusive load/store instructions with acquire/release semantics,
  which obsoletes use of explicit dmb instructions in most atomic operations.
- It deprecates "it" hints for some instructions and hints for more than one
  following instruction.
- It does not require elaborate code for switching between Thumb and A32
  modes as it supports Thumb 2 extension.
- It always supports instructions for bytes and halfwords.

The old ARM backend is now restricted to ARMv6 and ARMv7.
2020-06-23 19:10:51 +03:00
Andrey Semashev
2b00956d4d Fixed detection of cmpxchg16b on Windows with clang-cl. 2020-06-21 19:31:50 +03:00
Andrey Semashev
06f670d5cd Drop checks for BOOST_NO_ALIGNMENT and BOOST_HAS_INT128 for x86.
We no longer use the alignment attributes (except for alignas, when
available) in order to align the 128-bit storage for atomics. Instead we
rely on type_with_alignment for that. Although it may still use the same
attributes to acheve the required alignment, this is its implementation
detail and may not correspond to BOOST_NO_ALIGNMENT exactly.

The check for BOOST_HAS_INT128 was not really relevant to begin with
because __int128 is not guaranteed to have alignment of 16.

In any case, all current compilers targeting x86 do support alignment of
16, so the checks weren't doing anything.
2020-06-21 19:07:28 +03:00
Andrey Semashev
d42a407bb3 Use intrinsics in gcc sync backend to load and store large objects.
There is no guarantee of atomicity of plain loads and stores of anything
larger than a byte on an arbitrary hardware architecture. However, all
modern architectures seem to guarantee atomicity of loads and stores of
suitably aligned objects ate least up to a pointer size, so we use that
as the threshold. For larger objects we have to use intrinsics to
guarantee atomicity.
2020-06-21 19:07:28 +03:00
Andrey Semashev
b02b59fd3a Separated arch-specific core and fence operations to new ops structures.
The old operations template is replaced with core_operations, which falls
back to core_arch_operations, which falls back to core_operations_emulated.

The core_operations layer is intended for more or less architecture-neutral
backends, like the one based on gcc __atomic* intrinsics. It may fall back
to core_arch_operations where it is not supported by the compiler or where
the latter is more optimal. For example, where gcc does not implement 128-bit
atomic operations via __atomic* intrinsics, we support them in the
core_arch_operations backend, which uses inline assembler blocks.

The old emulated_operations template is largely unchanged and was renamed to
core_operations_emulated for naming consistency. All other operation templates
were also renamed for consistency (e.g. generic_wait_operations ->
wait_operations_generic).

Fence operations have been extracted to a separate set of structures:
fence_operations, fence_arch_opereations and fence_operations_emulated. These
are similar to the core operations described above. This structuring also
allows to fall back from fence_operations to fence_arch_opereations when
the latter is more optimal.

The net result of these changes is that 128-bit and 64-bit atomic operations
should now be consistently supported on all architectures that support them.
Previously, only x86 was supported via local hacks for gcc and clang.
2020-06-21 19:07:20 +03:00
Andrey Semashev
93e6b3a3f6 Fixed gcc asm-based PPC backend when 8 and 16-bit insns are unavailable.
We need to explicitly qualify base_type to call fence functions since the
base class is dependent on a template parameter now.
2020-06-20 00:21:36 +03:00
Andrey Semashev
18d5e470ba Initialize the dummy variable in x86 atomic_thread_fence(seq_cst).
The initialization is not needed for the code, but it is needed to make
tools like valgrind happy. Otherwise, the tools would mark the instructions
as accessing uninitialized data.

Also, changed the dummy variable to a byte. This may allow for a more lax
alignment.
2020-06-19 01:25:30 +03:00
Andrey Semashev
8b7a92a374 Corrected include guards naming. 2020-06-18 23:54:48 +03:00
Andrey Semashev
651dfd4afb Added gcc asm-based backend for AArch64.
The backend implements core and extra atomic operations using gcc asm blocks.
The implementation supports extensions added in ARMv8.1 and ARMv8.3. It supports
both little and big endian targets.

Currently, the code has not been tested on real hardware. It has been tested
on a QEMU VM.
2020-06-18 12:46:03 +00:00
Andrey Semashev
23aa6d98df Partly revert eb50aea437.
ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition allows to use
ldrex and other load-exclusive instructions without a matching strex in
Section A3.4.5 "Load-Exclusive and Store-Exclusive usage restrictions". And in
Section A3.5.3 "Atomicity in the ARM architecture" it states that ldrexd
atomically loads the 64-bit value from suitably aligned memory. This makes
the strexd added in eb50aea437 unnecessary.

ARM Architecture Reference Manual Armv8, for Armv8-A architecture profile
does not state explicitly that ldrexd can be used without a matching strexd,
but does not prohibit this either in Section E2.10.5 "Load-Exclusive and
Store-Exclusive instruction usage restrictions".
2020-06-14 15:38:42 +03:00
Andrey Semashev
7a4795c161 Added header/footer headers to centrally disable some useless warnings. 2020-06-14 01:34:37 +03:00
Andrey Semashev
b36797be8d Nonessential. 2020-06-12 22:59:46 +03:00
Andrey Semashev
eb50aea437 Added strexd to the 64-bit load asm backend on ARM.
Although we don't need to store anything after the load, we need to issue
strexd to reset the exclusive access mark on the storage address. So we
immediately store the loaded value back.

The technique to use ldrexd+strexd is described in ARM Architecture
Reference Manual ARMv8, Section B2.2.1. Although it is described for ARMv8,
the technique should be valid for previous versions as well.
2020-06-12 22:11:24 +03:00
Andrey Semashev
ec72e215b7 Use more fine grained capability macros includes and remove unneeded includes. 2020-06-12 19:35:05 +03:00
Andrey Semashev
5bc6d0389d Fixed compilation of asm-based backend for ARM.
Also, improve register allocation slightly for ARM32 and Thumb 2 modes.
2020-06-12 19:13:38 +03:00
Andrey Semashev
c205c7185b Adjusted ARM asm blocks formatting. 2020-06-12 15:27:39 +03:00
Andrey Semashev
629953ffe0 Removed forced inline markup from emulated APIs that don't use memory order.
Forced inline is mostly used to ensure the compiler is able to treat memory
order arguments as constants. It is also useful for constant propagation
on other arguments. This is not very useful for the emulated backend, so
we might as well allow the compiler to not inline the functions.
2020-06-12 03:14:33 +03:00
Andrey Semashev
69c150e178 Added a workaround for broken codegen in MSVC-8 affecting emulated wait.
When the emulated wait function is inlined, the compiler sometimes generates
code that acts as if a wrong value is returned from the wait function. The
compiler simply "forgets" to save the atomic value into an object on the
stack, which makes it later use a bogus value as the "returned" value.

Preventing inlining seems to work around the problem.

Discovered by wait_api notify_one/notify_all test failures for struct_3_bytes.
Oddly enough, the same test for uint32_t did not fail.
2020-06-12 02:51:37 +03:00
Andrey Semashev
65ada4d229 Change to a shorter instruction for seq_cst fences on x86.
Also, use explicitly sized integers for dummy arguments to the fence
instructions.
2020-06-12 00:55:31 +03:00
Andrey Semashev
559eba81af Use dummy atomic instruction instead of mfence for seq_cst fences on x86.
mfence is more expensive on most recent CPUs than a lock-prefixed instruction
on a dummy location, while the latter is sufficient to implement sequential
consistency on x86. Some performance test results are available here:

https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

Also, for seq_cst stores in gcc_atomic backend, use an xchg instead of
mov+mfence, which are generated by gcc versions older than 10.1.

The machinery to detect mfence presence is still left intact just in case
if we need to use this instruction in the future.

Closes https://github.com/boostorg/atomic/issues/36.
2020-06-11 22:32:01 +03:00
Andrey Semashev
ea70d79920 Fixed capability macros for 80-bit x87 long double types.
Capability macros for 80-bit long double would indicate no lock-free
support even if 128-bit atomic operations were available.
2020-06-11 13:07:46 +03:00
Andrey Semashev
53978fca3d Added a link to the article about Linux ARM atomic functions. 2020-06-11 13:07:45 +03:00
Andrey Semashev
e5e96fbc9a Added atomic_unsigned/signed_lock_free typedefs introduced in C++20.
The typedefs indicate the atomic object type for an unsigned/signed
integer that is lock-free and preferably has native support for waiting
and notifying operations.
2020-06-11 13:07:45 +03:00
Andrey Semashev
80cfbfd0de Added implementation of inter-process atomics.
The inter-process atomics have ipc_ prefixes: ipc_atomic, ipc_atomic_ref
and ipc_atomic_flag. These types are similar to their unprefixed counterparts
with the following distinctions:

- The operations are provided with an added precondition that is_lock_free()
  returns true.
- All operations, including waiting/notifying operations, are address-free,
  so the types are suitable for inter-process communication.
- The new has_native_wait_notify() operation and always_has_native_wait_notify
  static constant allow to test if the target platform has native support for
  address-free waiting/notifying operations. If it does not, a generic
  implementation is used based on a busy wait.
- The new set of capability macros added. The macros are named
  BOOST_ATOMIC_HAS_NATIVE_<T>_IPC_WAIT_NOTIFY and indicate whether address-free
  waiting/notifying operations are supported natively for a given type.

Additionally, to unify interface and implementation of different components,
the has_native_wait_notify() operation and always_has_native_wait_notify
static constant were added to non-IPC atomic types as well. Added
BOOST_ATOMIC_HAS_NATIVE_<T>_WAIT_NOTIFY capability macros to indicate
native support for inter-thread waiting/notifying operations.

Also, added is_lock_free() and is_always_lock_free to atomic_flag.

This commit adds implementation, docs and tests.
2020-06-11 13:07:16 +03:00
Andrey Semashev
e4f8770665 Reorganized atomic, atomic_ref and atomic_flag implementation.
Moved public classes definitions to the public headers and renamed
the internal implementation headers. This will allow to reuse the
implementation headers for inter-process atomics later.
2020-06-09 21:56:03 +03:00
Andrey Semashev
352a954ac1 Corrected BOOST_ATOMIC_FLAG_LOCK_FREE definition. 2020-06-09 21:55:38 +03:00
Andrey Semashev
32c396f4f1 Corrected syntax for integer constants in Alpha asm blocks. 2020-06-07 00:17:38 +03:00
Andrey Semashev
c849b6d877 Use 32-bit storage to implement atomic_flag.
Most platforms that support futexes or similar mechanisms support it
for 32-bit integers, which makes it more preferred to implement
atomic_flag efficiently. Most architectures also support 32-bit atomic
operations natively as well.

Also, reduced code duplication in instantiating operation backends.
2020-06-03 01:48:48 +03:00
Andrey Semashev
b9fadc852a Added Windows backend for waiting/notifying operations.
The backend uses runtime detection of availability of Windows API
for futex-like operations (only available since Windows 8).
2020-06-03 01:48:48 +03:00
Andrey Semashev
e72ccb02e4 Added support for NetBSD futex variant. 2020-06-03 01:48:48 +03:00
Andrey Semashev
214169b86e Added DragonFly BSD umtx backend for waiting/notifying operations. 2020-06-03 01:48:48 +03:00
Andrey Semashev
b5988af279 Added FreeBSD _umtx_op backend for waiting/notifying operations. 2020-06-03 01:48:48 +03:00
Andrey Semashev
bf182818f4 Added futex-based implementation for waiting/notifying operations. 2020-06-03 01:48:37 +03:00