boost/uuid - uuid - Hak8or git mirror Repos

boost/uuid

mirror of https://github.com/boostorg/uuid.git synced 2026-01-19 04:42:16 +00:00

Author	SHA1	Message	Date
Peter Dimov	2ce9519afc	Avoid -Wsign-conversion warning in detail/sha1.hpp	2026-01-05 16:54:54 +02:00
Peter Dimov	fd167bba0d	Avoid -Wsign-conversion warning in time_generator_v1.hpp	2026-01-05 16:54:54 +02:00
Peter Dimov	a835ddff90	Avoid -Wsign-conversion warning in time_generator_v7.hpp	2026-01-05 16:54:54 +02:00
Peter Dimov	91ffab27d2	Enable stricter warnings (matching Unordered) in test/Jamfile.v2	2026-01-05 16:54:54 +02:00
Peter Dimov	db92124922	Reorder includes	2026-01-05 16:02:11 +02:00
Peter Dimov	0038762216	Merge pull request #186 from Lastique/feature/from_chars_simd Add SIMD implementation of `from_chars`	2026-01-05 15:53:07 +02:00
Andrey Semashev	0e23b235fc	Use load/store helpers from endian.hpp in to/from_chars_x86.hpp. The load/store helpers use memcpy internally, which is a more correct way to load and store integers from/to unaligned memory and with potential type punning. In particular, it should silence UBSAN errors about unaligned memory accesses in SIMD algorithms.	2026-01-05 14:13:10 +03:00
Andrey Semashev	3698f8df2c	Added a missing include.	2026-01-05 14:13:10 +03:00
Andrey Semashev	9dde4978fd	Use memcpy/memset/memcmp functions from cstring.hpp in endian.hpp. This benefits integer reads/writes from using compiler intrinsics, when possible.	2026-01-05 14:13:10 +03:00
Andrey Semashev	d358c39a67	Use __builtin_memcpy/memcmp in cstring.hpp. The builtins are sometimes more strongly optimized than the libc function calls. They also don't need the <cstring> include. Added unqualified memcpy function that simply calls either the builtin or the libc function. This function is intended to be a drop-in replacement for the libc memcpy calls, where constexpr friendliness is not important. It is still marked as constexpr to allow mentioning them in other constexpr functions. To avoid early checks whether its body can be evaluated in the context of a constant expression, it is defined as a dummy template. Marked all functions as noexcept.	2026-01-05 14:13:10 +03:00
Andrey Semashev	02574368fc	Added GitHub Actions job on Rocketlake ISA.	2026-01-05 14:13:10 +03:00
Andrey Semashev	831a9e6eab	Added more tests for from_chars verifying unexpected end of input. The added tests check unexpected end of input on even and odd character positions, since these are handled separately in SIMD.	2026-01-05 14:13:10 +03:00
Andrey Semashev	7e50b1aaa7	Added running from_chars tests with SIMD disabled. Also added to/from_chars tests to CMakeLists.txt.	2026-01-05 14:13:10 +03:00
Andrey Semashev	b7535347ec	Allow users to enable 512-bit vectors in to_chars_x86.hpp. Following from_chars_x86.hpp, allow users to explicitly enable 512-bit vectors in to_chars by defining BOOST_UUID_TO_FROM_CHARS_X86_USE_ZMM. This is primarily to allow for experimenting and tuning performance on newer CPU microarchitectures.	2026-01-05 14:13:10 +03:00
Andrey Semashev	3920cc584c	Added x86 SIMD implementation of from_chars. This adds SSE4.1, AVX2, AVX-512v1 and AVX10.1 implementations of the from_chars algorithm. The generic implementation is moved to its own header and constexpr is relaxed to only enabled when is_constant_evaluated is supported. The performance effect on Intel Golden Cove (Core i7-12700K), gcc 13.3, in millions of successful from_chars() calls per second: Char \| Generic \| SSE4.1 \| AVX2 \| AVX512v1 \| AVX10.1 =========+=========+=================+=================+=================+================ char \| 38.571 \| 560.645 (14.5x) \| 501.505 (13.0x) \| 540.038 (14.0x) \| 480.778 (12.5x) char16_t \| 37.998 \| 479.308 (12.6x) \| 425.728 (11.2x) \| 416.379 (11.0x) \| 392.326 (10.3x) char32_t \| 50.327 \| 391.313 (7.78x) \| 359.312 (7.14x) \| 349.849 (6.95x) \| 333.979 (6.64x) The AVX2 version is slightly slower than SSE4.1 because on Intel microarchitectures the VEX-coded vpblendvb instruction is slower than the legacy SSE4.1 pblendvb. The code contains workarounds for this, which have slight performance overhead compared to SSE4.1 version, but are still faster than using vpblendvb. Alternatively, the performance could be improved by using asm blocks to force using pblendvb in AVX2 code, but this may potentially cause SSE/AVX transition penalties if the target vector register happens to have "dirty" upper bits. There's no way to ensure this doesn't happen, so this is not implemented. AVX512v1 claws back some performance and uses less instructions (i.e. smaller code size). The AVX10.1 version is slower as it uses vpermi2b instruction from AVX512_VBMI, which is relatively slow on Intel. It allows for reducing the number of instructions even further and the number of vector constants as well. The instruction is faster on AMD Zen 4 and should offer better performance compared to AVX512v1 code path, although it wasn't tested. This code path is disabled by default, unless BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B is defined, which can be used to test and tune performance on AMD and newer Intel CPU microarchitectures. Thus, by default, AVX10.1 performance should be roughly equivalent to AVX512v1, barring compiler (mis)optimizations. The unsuccessful parsing case depends on where the error happens, as the generic version may terminate sooner if the error is detected at the beginning of the input string, while the SIMD version performs roughly the same amount of work but faster. Here are some examples for 8-bit character types (for larger types the numbers are more or less comparable): Error \| Generic \| SSE4.1 \| AVX2 \| AVX512v1 \| AVX10.1 ===================+==========+=================+=================+=================+================ EOI at 35 chars \| 43.629 \| 356.562 (8.17x) \| 326.311 (7.48x) \| 322.377 (7.39x) \| 308.155 (7.06x) EOI at 1 char \| 2645.783 \| 444.769 (0.17x) \| 400.275 (0.15x) \| 404.826 (0.15x) \| 403.730 (0.15x) Missing dash at 23 \| 73.878 \| 514.303 (6.96x) \| 474.694 (6.43x) \| 507.949 (6.88x) \| 474.077 (6.42x) Missing dash at 8 \| 223.921 \| 516.641 (2.31x) \| 472.737 (2.11x) \| 506.242 (2.26x) \| 473.718 (2.12x) Illegal char at 35 \| 47.373 \| 368.002 (7.77x) \| 333.233 (7.03x) \| 318.242 (6.72x) \| 301.659 (6.37x) Illegal char at 0 \| 1729.087 \| 421.511 (0.24x) \| 385.217 (0.22x) \| 374.047 (0.22x) \| 351.944 (0.20x) The above table is collected with BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B defined. In general, only the very early errors tend to perform worse in the SIMD version and the majority of cases are still faster. Besides BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B, the implementation also has BOOST_UUID_TO_FROM_CHARS_X86_USE_ZMM control macro, which, if defined, enables usage of 512-bit registers for convertting from 32-bit character types to 8-bit integers. This code path is also slower than the 256-bit path on Golden Cove, and therefore is disabled. The macro is provided primarily to allow for tuning and experimentation with newer CPU microarchitectures, where the 512-bit path may become beneficial. All of the above performance numbers were produced without it.	2026-01-05 14:13:10 +03:00
Andrey Semashev	d0c74979a9	Separated Skylake-X level of AVX-512 to a new config macro. The new BOOST_UUID_USE_AVX512_V1 config macro indicates presence of AVX-512F, VL, CD, BW and DQ extensions, which are supported by Intel Skylake-X and similar processors. BOOST_UUID_USE_AVX10_1 is still retained and indicates support for full AVX10.1 set. For now, it only adds support for VBMI, but this list may grow in the future as new extensions are being utilized.	2026-01-05 14:13:10 +03:00
Andrey Semashev	f7718fd7cc	Removed load_unaligned_si128 helper function. This helper was used to simplify support for older CPUs, to select between _mm_loadu_si128 and _mm_lddqu_si128 intrinsics. That code has long been removed, and we now always use _mm_loadu_si128 to load data. Use the intrinsic directly everywhere.	2026-01-05 14:13:10 +03:00
Andrey Semashev	1f7875c97e	Added a simd_vector utility to complify SIMD constants definition. The simd_vector template is a wrapper around an array of elements that can automatically read that arrays as a SIMD vector. This reduces the amount of reinterpret_casts in SIMD code that uses constants.	2026-01-05 14:13:10 +03:00
Andrey Semashev	d3b72c2b71	Extracted from_chars_result to a separate header.	2026-01-05 14:13:10 +03:00
Andrey Semashev	79ecd9f563	Suppress 'conditional expression is constant' warning on MSVC.	2026-01-05 14:13:10 +03:00
Peter Dimov	9684559de7	Add Windows jobs to ci.yml using different /arch: values	2026-01-04 20:04:54 +02:00
Peter Dimov	83ab39d277	Update documentation	2026-01-04 04:10:34 +02:00
Peter Dimov	d5cf5c4656	Simplify operator>>	2026-01-04 04:06:55 +02:00
Peter Dimov	b24e19d64e	Merge pull request #187 from Lastique/feature/update_iostream_ops Update iostream operators	2026-01-04 04:02:54 +02:00
Andrey Semashev	211d84bdb1	Use stream character type when calling to_chars in operator<<. This avoids potential character code conversion in ostream and instead produces native character type directly in to_chars, which is likely much faster.	2026-01-03 03:50:38 +03:00
Andrey Semashev	45910f2ace	Use from_chars in operator>>. This removes code duplication with from_chars and allows for reusing a faster implementation of from_chars in operator>>. Also, align the input character buffer for more efficient memory accesses.	2026-01-03 03:49:48 +03:00
Peter Dimov	326e5db863	Merge pull request #185 from Lastique/feature/from_chars_result_op_bool Add `from_chars_result::operator bool()`	2025-12-29 13:08:27 +02:00
Peter Dimov	6ce513ef20	Merge pull request #184 from Lastique/feature/to_chars_simd Add SIMD implementation of `to_chars`	2025-12-28 12:09:20 +02:00
Andrey Semashev	454de03dbd	Updated docs with the new SIMD macros, added a release note for SIMD to_chars. Also clarified the meaning of BOOST_UUID_USE_AVX10_1 in the docs as the previous wording could be taken that it indicates support for a subset of AVX-512 that is supported in Skylake-X.	2025-12-27 23:53:21 +03:00
Andrey Semashev	ef9c903055	Reorder macos parameters in GitHub Actions CI for better readability in web UI.	2025-12-27 23:53:21 +03:00
Andrey Semashev	e6fe1c45d9	Added GitHub Actions jobs for AVX2-enabled target.	2025-12-27 23:53:21 +03:00
Andrey Semashev	2508c5434e	Added running IO and to_chars tests with SIMD disabled.	2025-12-27 23:53:21 +03:00
Andrey Semashev	84afcf6372	Align buffers for to_chars for better performance of SIMD implementation.	2025-12-27 23:53:21 +03:00
Andrey Semashev	839c431152	Added x86 SIMD implementation of to_chars. Moved the generic to_chars implementation to a separate header and made to_chars.hpp select the implementation based on the enabled SIMD ISA extensions. Added an x86 implementation leveraging SSSE3 and later vector extensions. Added detection of the said extensions to config.hpp. The performance effect on Intel Golden Cove (Core i7-12700K), gcc 13.3, in millions of to_chars() calls per second with a 16-byte aligned output buffer: Char \| Generic \| SSE4.1 \| AVX2 \| AVX-512 =========+=========+==================+==================+================= char \| 203.190 \| 1059.322 (5.21x) \| 1053.352 (5.18x) \| 1058.089 (5.21x) char16_t \| 184.003 \| 848.356 (4.61x) \| 1009.489 (5.49x) \| 1011.122 (5.50x) char32_t \| 202.425 \| 484.801 (2.39x) \| 676.338 (3.34x) \| 462.770 (2.29x) The core of the SIMD implementation is using 128-bit vectors, larger vectors are only used to convert to the target character types. This means that for 1-byte character types all vector implementations are basically the same (barring the extra ISA flexibility added by AVX) and for 2-byte character types AVX2 and AVX-512 are basically the same. For 4-byte character types, AVX-512 showed worse performance than SSE4.1 and AVX2 on the test system. It isn't clear why that is, but it is possible that the CPU throttles 512-bit instructions so much that the performance drops below a 256-bit equivalent. Perhaps, there are just not enough 512-bit instructions for the CPU to power up the full 512-bit pipeline. Therefore, the AVX-512 code path for 4-byte character types is currently disabled and the AVX2 path is used instead (which makes AVX2 and AVX-512 versions basically equivalent). The AVX-512 path can be enabled again if new CPU microarchitectures appear that will benefit from it. Higher alignment values of the output buffer were also tested, but they did not meaningfully improve performance.	2025-12-27 23:52:15 +03:00
Peter Dimov	f797b2617f	Improve from_chars, taking advantage of the fact that 0-9, A-F and a-f are consecutive in both ASCII and EBCDIC	2025-12-26 22:18:59 +02:00
Peter Dimov	b945f3ee23	Update documentation	2025-12-26 21:33:11 +02:00
Andrey Semashev	31bc20e30e	Added from_chars_result::operator bool().	2025-12-26 21:58:49 +03:00
Peter Dimov	c8f97785a4	Prefer SIMD/runtime performance over constexpr-ness when __builtin_is_constant_evaluated is not available	2025-12-26 19:27:20 +02:00
Peter Dimov	c963533f73	Make to_chars constexpr	2025-12-25 19:20:58 +02:00
Peter Dimov	e78661b1e2	Suppress msvc-14.1 warnings in test_hash_value_cx	2025-12-25 13:48:11 +02:00
Peter Dimov	963bb373d0	Attempt to fix GCC 5	2025-12-25 13:34:49 +02:00
Peter Dimov	09dd0dd608	Make hash_value constexpr	2025-12-25 12:41:38 +02:00
Peter Dimov	bd765b558c	Make uuid::is_nil, swap, and relational operators constexpr	2025-12-25 06:44:07 +02:00
Peter Dimov	5a5797d465	Add more constexpr to class uuid	2025-12-24 17:45:11 +02:00
Peter Dimov	bfee791f47	Update documentation	2025-12-23 19:48:35 +02:00
Peter Dimov	dd1145756e	Add VS2026 to .drone.jsonnet	2025-12-23 12:39:04 +02:00
Peter Dimov	da0ce8406a	Disable test_string_generator_cx2 under GCC 5	2025-12-23 11:37:33 +02:00
Peter Dimov	7b063dde86	Fix return type of from_chars_is_* functions	2025-12-23 11:29:40 +02:00
Peter Dimov	83d3da399c	Add test_string_generator_cx2.cpp	2025-12-23 04:21:49 +02:00
Peter Dimov	a851cf33ca	Add test_string_generator_2.cpp	2025-12-23 04:15:37 +02:00

1 2 3 4 5 ...

737 Commits