described in #1011 by @aleksazr as point 2:
this optimization avoids copying accumulators
when there is not enough data to compute them anyway.
Seems effective on MSVC.
Such a scenario implies streaming, and receiving very little data per invocation.
We've observed the NEON implementation perform 25% to 40% better than the SVE one.
aarch64 builds will now prefer the NEON variation instead of the SVE one.
The SVE version can still be used by explicitly defining XXH_SVE
When XXH_FORCE_MEMORY_ACCESS==1, which is the default on supported
compilers unlike stated in the README, a strict aliasing violation
occurs in XXH_read64, resulting in miscompilation on GCC (but not Clang)
in some oddly specific circumstances.
The following code reproduces the problem on x86_64 GCC 14.2.1 when
compiled with -O3:
#define XXH_INLINE_ALL
#include <inttypes.h>
#include <stdio.h>
#include <xxhash.h>
int main() {
// it seems this has to be exactly 24 bytes.
union {
char x[24];
// force 8-byte alignment without making
// aliasable with uint64_t.
void *y[3];
} data = {.x = "garblegarblegarblegarble"};
uint64_t hash = XXH64(&data, sizeof(data), 0);
printf("%016"PRIx64"\n", hash);
return 0;
}
A bogus -Wuninitialized warning is produced if enabled, and the
resulting program outputs an incorrect hash.
While this definitely looks like a compiler bug in some ways, I see no
reason to assume aligned(1) alone should excempt the type from aliasing
restrictions regardless, and adding may_alias does fix the problem.
The separate strict aliasing bug when XXH_FORCE_ALIGN_CHECK==1,
discussed in #383, still remains.
apparently, UBSAN uses the prototype of the inner inlined function
instead of the outer shell of the function.
Let's see if one indirection level is enough to fix this.
1. Display the mode which is used as below:
"loongarch64 + lasx" -> LoongArch64 platform with LoongArch Advanced SIMD Extension
"loongarch64 + lsx" -> LoongArch64 platform with LoongArch SIMD Extension
"loongarch64" -> LoongArch64 platform, use scalar implement
2. Align the define in xxhash.h
they are suspicious that some tests on unsigned values might wrap around.
They should not, but let's write these branches differently to also cover these cases.
- Fix the state structs to use unsigned char, update names
- Extract the algorithm steps into inline subroutines
- Fix a theoretical integer overflow bug
- Reroll XXH64 on 32-bit (it is going to run like crap anyway)
GCC does not support the clang builtins but does have
type generic, type-checking __builtin_stdc_rotate_left.
The only advantage of this is that the builtin evaluates the arguments
only once.