atomic

mirror of https://github.com/boostorg/atomic.git synced 2026-02-03 08:42:08 +00:00

Author	SHA1	Message	Date
Andrey Semashev	da03b93349	Don't check for LSE in common extra ops in AArch64. The check is already made in the size-specific specializations of the extra operations, where needed. For 128-bit operations there are no LSE instructions, which means the common template should still use the more efficient op operations instead of fetch_op.	2020-06-25 15:52:40 +03:00
Andrey Semashev	46bc34778b	Use op instead of fetch_op to implement opaque_op in AArch32 and AArch64. The operations returning the result of the operation instead of the original value require less registers, so are more efficient. The exception is AArch64 LSE extension, which adds dedicated atomic instructions which are presumably more efficient than ll/sc loops.	2020-06-25 15:35:29 +03:00
Andrey Semashev	ac04b0f182	Replaced "+&" output constraints with "+" in the Alpha backend. The "+&" seems unnecessary since the argument is used for both input and output, so "early clobber" flag is meaningless in this case.	2020-06-24 22:00:53 +03:00
Andrey Semashev	53a00d5ca4	Replaced "+&" output constraints with "+" in the ARM backend. The "+&" seems unnecessary since the argument is used for both input and output, so "early clobber" flag is meaningless in this case.	2020-06-24 21:58:22 +03:00
Andrey Semashev	9e9d1f4398	Added AArch32 and AArch64 gcc asm-based backends. This is the second iteration of the backends, which were both tested on a QEMU VM and did not show any test failures. The essential difference from the first version is that in AArch64 we now initialize the success flag in the asm blocks in compare_exchange operations rather than relying on compiler initializing it before passing into the asm block as an in-out parameter. Apparently, this sometimes didn't work for some reason, which made compare_exchange_strong return incorrect value, which broke futex-based mutexes in the lock pool. The above change was also applied to AArch32, along with minor corrections in the asm blocks constraints.	2020-06-24 21:52:52 +03:00
Andrey Semashev	fd2326cf4d	Added a check in notify_one tests if the first thread wakes up too late. In that case the first thread may receive the value3 instead of value2.	2020-06-24 14:53:27 +03:00
Andrey Semashev	5ec2265754	Removed AArch32 and AArch64 gcc asm-based backends. The backends will be moved to a separate branch for testing.	2020-06-24 14:41:33 +03:00
Andrey Semashev	fd6c3da53f	Disabled AArch32 and AArch64 gcc asm-based backends. During testing in a VM occasional fallback_wait_fuzz test failures were observed when AArch64 asm-based backend was used to implement futexes in the lock pool. It is not clear yet what causes the failures, but they don't appear when __atomic* intrinsics are used. Further investigation needed. AArch32 asm-based backend is practically untested and has very much in common with AArch64, so I'm disabling it as well until at least the problem with AArch64 is resolved.	2020-06-23 23:45:46 +00:00
Andrey Semashev	705d9b8800	Slight optimization of wait state management.	2020-06-23 22:48:07 +00:00
Andrey Semashev	a247342a13	Added support for newer gcc versions and ARMv8 AArch32 in lockfree test.	2020-06-23 22:55:14 +03:00
Andrey Semashev	e5bb7eca93	Corrected check for cmpxchg16b availability in lockfree test.	2020-06-23 21:57:18 +03:00
Andrey Semashev	4bdf9cd84d	Renamed extra ops templates for naming consistency.	2020-06-23 21:53:10 +03:00
Andrey Semashev	f8d2a58af0	Use hash prefix before constants in AArch64 asm blocks.	2020-06-23 20:56:01 +03:00
Andrey Semashev	840e7a17ca	Optimized ARM asm blocks. - Use dedicated registers to return success from compare_exchange methods. - Pre-initialize success flag outside asm blocks. - Use "+Q" constraints for memory operands in 64-bit operations. This allows to remove "memory" clobber. - Avoid using lots of conditional instructions in 64-bit compare_exchange operations. Simplify success flag derivation.	2020-06-23 20:43:10 +03:00
Andrey Semashev	c59268b513	Added support for big-endian targets in gcc asm-based backend for ARM. The legacy ARM backend used to assume little endian byte order, which is significant in 64-bit atomic operations that involve addition or subtraction.	2020-06-23 19:46:31 +03:00
Andrey Semashev	ac32aad65f	Added gcc asm-based backend for AArch32. ARMv8 (AArch32) is significantly different from ARMv7, which warrants addition of a separate asm-based backend: - It adds exclusive load/store instructions with acquire/release semantics, which obsoletes use of explicit dmb instructions in most atomic operations. - It deprecates "it" hints for some instructions and hints for more than one following instruction. - It does not require elaborate code for switching between Thumb and A32 modes as it supports Thumb 2 extension. - It always supports instructions for bytes and halfwords. The old ARM backend is now restricted to ARMv6 and ARMv7.	2020-06-23 19:10:51 +03:00
Andrey Semashev	2b00956d4d	Fixed detection of cmpxchg16b on Windows with clang-cl.	2020-06-21 19:31:50 +03:00
Andrey Semashev	d3a64b0a5b	Use Intel Core 2 as the target instruction set in x86 CI. This enables cmpxchg16b instruction, which should be available on all modern CPUs.	2020-06-21 19:08:42 +03:00
Andrey Semashev	06f670d5cd	Drop checks for BOOST_NO_ALIGNMENT and BOOST_HAS_INT128 for x86. We no longer use the alignment attributes (except for alignas, when available) in order to align the 128-bit storage for atomics. Instead we rely on type_with_alignment for that. Although it may still use the same attributes to acheve the required alignment, this is its implementation detail and may not correspond to BOOST_NO_ALIGNMENT exactly. The check for BOOST_HAS_INT128 was not really relevant to begin with because __int128 is not guaranteed to have alignment of 16. In any case, all current compilers targeting x86 do support alignment of 16, so the checks weren't doing anything.	2020-06-21 19:07:28 +03:00
Andrey Semashev	06dcdf26c6	Added a configure check to test if synchronization.lib exists. We need to explicitly link with synchronization.lib when the WaitOnAddress API is enabled at compile time for ARM targets. Since this library is only available on newer Windows SDKs, we have to perform a configure check for whether it is available.	2020-06-21 19:07:28 +03:00
Andrey Semashev	22de842159	Fixed missing braces in lock_state initializer when alignas is not available.	2020-06-21 19:07:28 +03:00
Andrey Semashev	d42a407bb3	Use intrinsics in gcc sync backend to load and store large objects. There is no guarantee of atomicity of plain loads and stores of anything larger than a byte on an arbitrary hardware architecture. However, all modern architectures seem to guarantee atomicity of loads and stores of suitably aligned objects ate least up to a pointer size, so we use that as the threshold. For larger objects we have to use intrinsics to guarantee atomicity.	2020-06-21 19:07:28 +03:00
Andrey Semashev	b02b59fd3a	Separated arch-specific core and fence operations to new ops structures. The old operations template is replaced with core_operations, which falls back to core_arch_operations, which falls back to core_operations_emulated. The core_operations layer is intended for more or less architecture-neutral backends, like the one based on gcc __atomic* intrinsics. It may fall back to core_arch_operations where it is not supported by the compiler or where the latter is more optimal. For example, where gcc does not implement 128-bit atomic operations via __atomic* intrinsics, we support them in the core_arch_operations backend, which uses inline assembler blocks. The old emulated_operations template is largely unchanged and was renamed to core_operations_emulated for naming consistency. All other operation templates were also renamed for consistency (e.g. generic_wait_operations -> wait_operations_generic). Fence operations have been extracted to a separate set of structures: fence_operations, fence_arch_opereations and fence_operations_emulated. These are similar to the core operations described above. This structuring also allows to fall back from fence_operations to fence_arch_opereations when the latter is more optimal. The net result of these changes is that 128-bit and 64-bit atomic operations should now be consistently supported on all architectures that support them. Previously, only x86 was supported via local hacks for gcc and clang.	2020-06-21 19:07:20 +03:00
Andrey Semashev	93e6b3a3f6	Fixed gcc asm-based PPC backend when 8 and 16-bit insns are unavailable. We need to explicitly qualify base_type to call fence functions since the base class is dependent on a template parameter now.	2020-06-20 00:21:36 +03:00
Andrey Semashev	18d5e470ba	Initialize the dummy variable in x86 atomic_thread_fence(seq_cst). The initialization is not needed for the code, but it is needed to make tools like valgrind happy. Otherwise, the tools would mark the instructions as accessing uninitialized data. Also, changed the dummy variable to a byte. This may allow for a more lax alignment.	2020-06-19 01:25:30 +03:00
Andrey Semashev	8b7a92a374	Corrected include guards naming.	2020-06-18 23:54:48 +03:00
Andrey Semashev	651dfd4afb	Added gcc asm-based backend for AArch64. The backend implements core and extra atomic operations using gcc asm blocks. The implementation supports extensions added in ARMv8.1 and ARMv8.3. It supports both little and big endian targets. Currently, the code has not been tested on real hardware. It has been tested on a QEMU VM.	2020-06-18 12:46:03 +00:00
Andrey Semashev	0cf7964f78	Another workaround for IPC notify_one failures on Windows. The previous change to increase the delay didn't help, so we're instead changing the expectation - the first woken thread is allowed to receive value3 on wake up.	2020-06-14 19:12:33 +03:00
Andrey Semashev	3de4c6c865	Increase delay between notifications in IPC notify_one test. Occasionally, IPC notify_one test fails on Windows because the first of the woken threads receives value3 from wait(). This is possible if the thread lingers in wait() for some reason. Increase the delay before the second notification slightly to reduce the likelihood of this happening.	2020-06-14 18:10:30 +03:00
Andrey Semashev	23aa6d98df	Partly revert `eb50aea437`. ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition allows to use ldrex and other load-exclusive instructions without a matching strex in Section A3.4.5 "Load-Exclusive and Store-Exclusive usage restrictions". And in Section A3.5.3 "Atomicity in the ARM architecture" it states that ldrexd atomically loads the 64-bit value from suitably aligned memory. This makes the strexd added in `eb50aea437` unnecessary. ARM Architecture Reference Manual Armv8, for Armv8-A architecture profile does not state explicitly that ldrexd can be used without a matching strexd, but does not prohibit this either in Section E2.10.5 "Load-Exclusive and Store-Exclusive instruction usage restrictions".	2020-06-14 15:38:42 +03:00
Andrey Semashev	7a4795c161	Added header/footer headers to centrally disable some useless warnings.	2020-06-14 01:34:37 +03:00
Andrey Semashev	b36797be8d	Nonessential.	2020-06-12 22:59:46 +03:00
Andrey Semashev	eb50aea437	Added strexd to the 64-bit load asm backend on ARM. Although we don't need to store anything after the load, we need to issue strexd to reset the exclusive access mark on the storage address. So we immediately store the loaded value back. The technique to use ldrexd+strexd is described in ARM Architecture Reference Manual ARMv8, Section B2.2.1. Although it is described for ARMv8, the technique should be valid for previous versions as well.	2020-06-12 22:11:24 +03:00
Andrey Semashev	ec72e215b7	Use more fine grained capability macros includes and remove unneeded includes.	2020-06-12 19:35:05 +03:00
Andrey Semashev	5bc6d0389d	Fixed compilation of asm-based backend for ARM. Also, improve register allocation slightly for ARM32 and Thumb 2 modes.	2020-06-12 19:13:38 +03:00
Andrey Semashev	c205c7185b	Adjusted ARM asm blocks formatting.	2020-06-12 15:27:39 +03:00
Andrey Semashev	3929919495	Implement a special test_clock for Windows. The implementation uses GetTickCount/GetTickCount64 internally, which is a steady and sufficiently low precision time source. We need the clock to have relatively low precision so that wait tests don't fail spuriously because the blocked threads wake up too soon, according to more precise clocks. boost::chrono::system_clock currently has an acceptably low precision, but it is not a steady clock.	2020-06-12 13:32:32 +03:00
Andrey Semashev	72c87ca51b	Use a lower resolution clock on Windows to reduce spurious test failures.	2020-06-12 03:24:29 +03:00
Andrey Semashev	629953ffe0	Removed forced inline markup from emulated APIs that don't use memory order. Forced inline is mostly used to ensure the compiler is able to treat memory order arguments as constants. It is also useful for constant propagation on other arguments. This is not very useful for the emulated backend, so we might as well allow the compiler to not inline the functions.	2020-06-12 03:14:33 +03:00
Andrey Semashev	69c150e178	Added a workaround for broken codegen in MSVC-8 affecting emulated wait. When the emulated wait function is inlined, the compiler sometimes generates code that acts as if a wrong value is returned from the wait function. The compiler simply "forgets" to save the atomic value into an object on the stack, which makes it later use a bogus value as the "returned" value. Preventing inlining seems to work around the problem. Discovered by wait_api notify_one/notify_all test failures for struct_3_bytes. Oddly enough, the same test for uint32_t did not fail.	2020-06-12 02:51:37 +03:00
Andrey Semashev	1b8ec1700b	Reworked IPC atomic tests to check for the is_always_lockfree property. Checking for the capability macros is not good enough because ipc_atomic_ref can be not lock-free even when the macro (and ipc_atomic) indicates lock-free. We now check the is_always_lockfree property to decide whether to run or skip tests for a given IPC atomic type. Also, made struct_3_bytes output more informative.	2020-06-12 01:58:12 +03:00
Andrey Semashev	58b618d299	Added a basic compile test for fences.	2020-06-12 00:57:59 +03:00
Andrey Semashev	65ada4d229	Change to a shorter instruction for seq_cst fences on x86. Also, use explicitly sized integers for dummy arguments to the fence instructions.	2020-06-12 00:55:31 +03:00
Andrey Semashev	559eba81af	Use dummy atomic instruction instead of mfence for seq_cst fences on x86. mfence is more expensive on most recent CPUs than a lock-prefixed instruction on a dummy location, while the latter is sufficient to implement sequential consistency on x86. Some performance test results are available here: https://shipilev.net/blog/2014/on-the-fence-with-dependencies/ Also, for seq_cst stores in gcc_atomic backend, use an xchg instead of mov+mfence, which are generated by gcc versions older than 10.1. The machinery to detect mfence presence is still left intact just in case if we need to use this instruction in the future. Closes https://github.com/boostorg/atomic/issues/36.	2020-06-11 22:32:01 +03:00
Andrey Semashev	36561406c2	Use api-ms-win-core-synch-l1-2-0.dll to query WaitOnAddress and friends. It was suggested in a comment[1] that the correct dll to use to resolve WaitOnAddress, WakeByAddressSingle and WakeByAddressAll is api-ms-win-core-synch-l1-2-0.dll instead of KernelBase.dll. On some systems KernelBase.dll may not be available and the WaitOnAddress API may be implemented in a different library. Tests have shown that GetModuleHandleW(L"api-ms-win-core-synch-l1-2-0.dll") returns a handle for KernelBase.dll anyway. Also, there exists a presumably older version of this library: api-ms-win-core-synch-l1-1-0.dll. The older version is also "loaded" into the process and also resolves to KernelBase.dll, which suggests that hopefully api-ms-win-core-synch-l1-2-0.dll will stay available and working in the forseeable future. [1]: https://github.com/microsoft/STL/pull/593#issuecomment-641019129	2020-06-11 13:07:46 +03:00
Andrey Semashev	ea70d79920	Fixed capability macros for 80-bit x87 long double types. Capability macros for 80-bit long double would indicate no lock-free support even if 128-bit atomic operations were available.	2020-06-11 13:07:46 +03:00
Andrey Semashev	53978fca3d	Added a link to the article about Linux ARM atomic functions.	2020-06-11 13:07:45 +03:00
Andrey Semashev	e5e96fbc9a	Added atomic_unsigned/signed_lock_free typedefs introduced in C++20. The typedefs indicate the atomic object type for an unsigned/signed integer that is lock-free and preferably has native support for waiting and notifying operations.	2020-06-11 13:07:45 +03:00
Andrey Semashev	80cfbfd0de	Added implementation of inter-process atomics. The inter-process atomics have ipc_ prefixes: ipc_atomic, ipc_atomic_ref and ipc_atomic_flag. These types are similar to their unprefixed counterparts with the following distinctions: - The operations are provided with an added precondition that is_lock_free() returns true. - All operations, including waiting/notifying operations, are address-free, so the types are suitable for inter-process communication. - The new has_native_wait_notify() operation and always_has_native_wait_notify static constant allow to test if the target platform has native support for address-free waiting/notifying operations. If it does not, a generic implementation is used based on a busy wait. - The new set of capability macros added. The macros are named BOOST_ATOMIC_HAS_NATIVE_<T>_IPC_WAIT_NOTIFY and indicate whether address-free waiting/notifying operations are supported natively for a given type. Additionally, to unify interface and implementation of different components, the has_native_wait_notify() operation and always_has_native_wait_notify static constant were added to non-IPC atomic types as well. Added BOOST_ATOMIC_HAS_NATIVE_<T>_WAIT_NOTIFY capability macros to indicate native support for inter-thread waiting/notifying operations. Also, added is_lock_free() and is_always_lock_free to atomic_flag. This commit adds implementation, docs and tests.	2020-06-11 13:07:16 +03:00
Andrey Semashev	e4f8770665	Reorganized atomic, atomic_ref and atomic_flag implementation. Moved public classes definitions to the public headers and renamed the internal implementation headers. This will allow to reuse the implementation headers for inter-process atomics later.	2020-06-09 21:56:03 +03:00

1 2 3 4 5 ...

491 Commits