2
0
mirror of https://github.com/boostorg/uuid.git synced 2026-01-19 04:42:16 +00:00
Andrey Semashev 3920cc584c Added x86 SIMD implementation of from_chars.
This adds SSE4.1, AVX2, AVX-512v1 and AVX10.1 implementations of the
from_chars algorithm. The generic implementation is moved to its own
header and constexpr is relaxed to only enabled when is_constant_evaluated
is supported.

The performance effect on Intel Golden Cove (Core i7-12700K), gcc 13.3,
in millions of successful from_chars() calls per second:

Char     | Generic | SSE4.1          | AVX2            | AVX512v1        | AVX10.1
=========+=========+=================+=================+=================+================
char     |  38.571 | 560.645 (14.5x) | 501.505 (13.0x) | 540.038 (14.0x) | 480.778 (12.5x)
char16_t |  37.998 | 479.308 (12.6x) | 425.728 (11.2x) | 416.379 (11.0x) | 392.326 (10.3x)
char32_t |  50.327 | 391.313 (7.78x) | 359.312 (7.14x) | 349.849 (6.95x) | 333.979 (6.64x)

The AVX2 version is slightly slower than SSE4.1 because on Intel
microarchitectures the VEX-coded vpblendvb instruction is slower than
the legacy SSE4.1 pblendvb. The code contains workarounds for this, which
have slight performance overhead compared to SSE4.1 version, but are still
faster than using vpblendvb. Alternatively, the performance could be
improved by using asm blocks to force using pblendvb in AVX2 code, but this
may potentially cause SSE/AVX transition penalties if the target vector
register happens to have "dirty" upper bits. There's no way to ensure this
doesn't happen, so this is not implemented. AVX512v1 claws back some
performance and uses less instructions (i.e. smaller code size).

The AVX10.1 version is slower as it uses vpermi2b instruction from AVX512_VBMI,
which is relatively slow on Intel. It allows for reducing the number of
instructions even further and the number of vector constants as well. The
instruction is faster on AMD Zen 4 and should offer better performance compared
to AVX512v1 code path, although it wasn't tested. This code path is disabled
by default, unless BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B is defined, which
can be used to test and tune performance on AMD and newer Intel CPU
microarchitectures. Thus, by default, AVX10.1 performance should be roughly
equivalent to AVX512v1, barring compiler (mis)optimizations.

The unsuccessful parsing case depends on where the error happens, as the
generic version may terminate sooner if the error is detected at the
beginning of the input string, while the SIMD version performs roughly
the same amount of work but faster. Here are some examples for 8-bit
character types (for larger types the numbers are more or less comparable):

Error              | Generic  | SSE4.1          | AVX2            | AVX512v1        | AVX10.1
===================+==========+=================+=================+=================+================
EOI at 35 chars    |   43.629 | 356.562 (8.17x) | 326.311 (7.48x) | 322.377 (7.39x) | 308.155 (7.06x)
EOI at 1 char      | 2645.783 | 444.769 (0.17x) | 400.275 (0.15x) | 404.826 (0.15x) | 403.730 (0.15x)
Missing dash at 23 |   73.878 | 514.303 (6.96x) | 474.694 (6.43x) | 507.949 (6.88x) | 474.077 (6.42x)
Missing dash at 8  |  223.921 | 516.641 (2.31x) | 472.737 (2.11x) | 506.242 (2.26x) | 473.718 (2.12x)
Illegal char at 35 |   47.373 | 368.002 (7.77x) | 333.233 (7.03x) | 318.242 (6.72x) | 301.659 (6.37x)
Illegal char at 0  | 1729.087 | 421.511 (0.24x) | 385.217 (0.22x) | 374.047 (0.22x) | 351.944 (0.20x)

The above table is collected with BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B
defined.

In general, only the very early errors tend to perform worse in the SIMD
version and the majority of cases are still faster.

Besides BOOST_UUID_FROM_CHARS_X86_USE_VPERMI2B, the implementation also has
BOOST_UUID_TO_FROM_CHARS_X86_USE_ZMM control macro, which, if defined, enables
usage of 512-bit registers for convertting from 32-bit character types to 8-bit
integers. This code path is also slower than the 256-bit path on Golden Cove,
and therefore is disabled. The macro is provided primarily to allow for tuning
and experimentation with newer CPU microarchitectures, where the 512-bit path
may become beneficial. All of the above performance numbers were produced
without it.
2026-01-05 14:13:10 +03:00
2024-04-25 22:50:04 +03:00
2024-04-20 23:08:51 +03:00
2025-12-23 12:39:04 +02:00
2024-08-23 19:43:03 +03:00
2024-04-24 20:29:41 +03:00
2018-04-29 09:57:56 -04:00

Boost.Uuid

Boost.Uuid, part of Boost C++ Libraries, provides a C++ implementation of Universally Unique Identifiers (UUID) as described in RFC 4122 and RFC 9562.

See the documentation for more information.

License

Distributed under the Boost Software License, Version 1.0.

Properties

  • C++11 (since Boost 1.86.0)
  • Header-only

Current Status

Branch Github Actions Appveyor Dependencies Documentation Test Matrix
master Build Status Build status Deps Documentation Enter the Matrix
develop Build Status Build status Deps Documentation Enter the Matrix

More Information

  • Ask questions
  • Report bugs: Be sure to mention Boost version, platform and compiler you're using. A small compilable code sample to reproduce the problem is always good as well.
  • Submit your patches as pull requests against the develop branch. Note that by submitting patches you agree to license your modifications under the Boost Software License, Version 1.0.
  • Discussions about the library are held on the Boost developers mailing list. Be sure to read the discussion policy before posting and add the [uuid] tag at the beginning of the subject line.

Code Example - UUID Generation

//  mkuuid.cpp example

#include <boost/uuid.hpp>
#include <iostream>

int main()
{
    boost::uuids::random_generator gen;
    std::cout << gen() << std::endl;
}
$ clang++ -Wall -Wextra -std=c++11 -O2 mkuuid.cpp -o mkuuid
$ ./mkuuid
2c186eb0-89cf-4a3c-9b97-86db1670d5f4
$ ./mkuuid
a9d3fbb9-0383-4389-a8a8-61f6629f90b6
Description
Mirrored via gitea-mirror
Readme BSL-1.0 1.4 MiB
Languages
C++ 98.3%
CMake 1.1%
Shell 0.3%
Batchfile 0.2%