mirror of
https://github.com/boostorg/uuid.git
synced 2026-01-19 04:42:16 +00:00
3920cc584c734a6a73dafe4e77b92ceeeb3471ef
This adds SSE4.1, AVX2, AVX-512v1 and AVX10.1 implementations of the from_chars algorithm. The generic implementation is moved to its own header and constexpr is relaxed to only enabled when is_constant_evaluated is supported. The performance effect on Intel Golden Cove (Core i7-12700K), gcc 13.3, in millions of successful from_chars() calls per second: Char | Generic | SSE4.1 | AVX2 | AVX512v1 | AVX10.1 =========+=========+=================+=================+=================+================ char | 38.571 | 560.645 (14.5x) | 501.505 (13.0x) | 540.038 (14.0x) | 480.778 (12.5x) char16_t | 37.998 | 479.308 (12.6x) | 425.728 (11.2x) | 416.379 (11.0x) | 392.326 (10.3x) char32_t | 50.327 | 391.313 (7.78x) | 359.312 (7.14x) | 349.849 (6.95x) | 333.979 (6.64x) The AVX2 version is slightly slower than SSE4.1 because on Intel microarchitectures the VEX-coded vpblendvb instruction is slower than the legacy SSE4.1 pblendvb. The code contains workarounds for this, which have slight performance overhead compared to SSE4.1 version, but are still faster than using vpblendvb. Alternatively, the performance could be improved by using asm blocks to force using pblendvb in AVX2 code, but this may potentially cause SSE/AVX transition penalties if the target vector register happens to have "dirty" upper bits. There's no way to ensure this doesn't happen, so this is not implemented. AVX512v1 claws back some performance and uses less instructions (i.e. smaller code size). The AVX10.1 version is slower as it uses vpermi2b instruction from AVX512_VBMI, which is relatively slow on Intel. It allows for reducing the number of instructions even further and the number of vector constants as well. The instruction is faster on AMD Zen 4 and should offer better performance compared to AVX512v1 code path, although it wasn't tested. This code path is disabled by default, unless BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B is defined, which can be used to test and tune performance on AMD and newer Intel CPU microarchitectures. Thus, by default, AVX10.1 performance should be roughly equivalent to AVX512v1, barring compiler (mis)optimizations. The unsuccessful parsing case depends on where the error happens, as the generic version may terminate sooner if the error is detected at the beginning of the input string, while the SIMD version performs roughly the same amount of work but faster. Here are some examples for 8-bit character types (for larger types the numbers are more or less comparable): Error | Generic | SSE4.1 | AVX2 | AVX512v1 | AVX10.1 ===================+==========+=================+=================+=================+================ EOI at 35 chars | 43.629 | 356.562 (8.17x) | 326.311 (7.48x) | 322.377 (7.39x) | 308.155 (7.06x) EOI at 1 char | 2645.783 | 444.769 (0.17x) | 400.275 (0.15x) | 404.826 (0.15x) | 403.730 (0.15x) Missing dash at 23 | 73.878 | 514.303 (6.96x) | 474.694 (6.43x) | 507.949 (6.88x) | 474.077 (6.42x) Missing dash at 8 | 223.921 | 516.641 (2.31x) | 472.737 (2.11x) | 506.242 (2.26x) | 473.718 (2.12x) Illegal char at 35 | 47.373 | 368.002 (7.77x) | 333.233 (7.03x) | 318.242 (6.72x) | 301.659 (6.37x) Illegal char at 0 | 1729.087 | 421.511 (0.24x) | 385.217 (0.22x) | 374.047 (0.22x) | 351.944 (0.20x) The above table is collected with BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B defined. In general, only the very early errors tend to perform worse in the SIMD version and the majority of cases are still faster. Besides BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B, the implementation also has BOOST_UUID_TO_FROM_CHARS_X86_USE_ZMM control macro, which, if defined, enables usage of 512-bit registers for convertting from 32-bit character types to 8-bit integers. This code path is also slower than the 256-bit path on Golden Cove, and therefore is disabled. The macro is provided primarily to allow for tuning and experimentation with newer CPU microarchitectures, where the 512-bit path may become beneficial. All of the above performance numbers were produced without it.
Boost.Uuid
Boost.Uuid, part of Boost C++ Libraries, provides a C++ implementation of Universally Unique Identifiers (UUID) as described in RFC 4122 and RFC 9562.
See the documentation for more information.
License
Distributed under the Boost Software License, Version 1.0.
Properties
- C++11 (since Boost 1.86.0)
- Header-only
Current Status
| Branch | Github Actions | Appveyor | Dependencies | Documentation | Test Matrix |
|---|---|---|---|---|---|
master |
|||||
develop |
More Information
- Ask questions
- Report bugs: Be sure to mention Boost version, platform and compiler you're using. A small compilable code sample to reproduce the problem is always good as well.
- Submit your patches as pull requests against the develop branch. Note that by submitting patches you agree to license your modifications under the Boost Software License, Version 1.0.
- Discussions about the library are held on the Boost developers mailing list. Be sure to read the discussion policy before posting and add the
[uuid]tag at the beginning of the subject line.
Code Example - UUID Generation
// mkuuid.cpp example
#include <boost/uuid.hpp>
#include <iostream>
int main()
{
boost::uuids::random_generator gen;
std::cout << gen() << std::endl;
}
$ clang++ -Wall -Wextra -std=c++11 -O2 mkuuid.cpp -o mkuuid
$ ./mkuuid
2c186eb0-89cf-4a3c-9b97-86db1670d5f4
$ ./mkuuid
a9d3fbb9-0383-4389-a8a8-61f6629f90b6
Languages
C++
98.3%
CMake
1.1%
Shell
0.3%
Batchfile
0.2%