- Introduce both s_imageCache and s_krnlCache on all platforms, even if unused (will be reused later to unify platforms handling)
- This means that what userland images that used to be unsorted are now sorted
The name was a bit misleading as it could be mistaken to mean "The cache contains the address" and not as "has an image with this start address". ie: that it could be mistaken to do GetImageForAddress( startAddress ) != nullptr.
[`/DELAYLOAD`](https://learn.microsoft.com/en-us/cpp/build/reference/delayload-delay-load-import?view=msvc-170) only works with exported functions, not variables (as it patches thunks after the first load).
So instead of using exporting a pointer for `___tracy_RtlWalkFrameChain`, make it a function that will call said pointer. This only adds a `mov`+`jmp` instructions in optimized builds to the call, which is negligible compared to the actual cost of RtlWalkFrameChain. This also does not affect the reported callstack since it will use a `jmp` without touching the stack, so `___tracy_RtlWalkFrameChain` won't appear in the capture stacktrace.
- Add a tool "tracy-edit" that allows loading a tracy capture, patching symbols and recompress the result
- Add offline symbol resolvers for linux (using addr2line) and windows (using dbghelper)
There might have been new modules loaded by another thread between the `SymInitialize` and `EnumProcessModules` calls.
Since we register the enumerated modules into the cache, we need to make sure that symbols for this module are loaded.
The only way to do that is to call `SymLoadModuleEx`, just like we do when finding new modules after `InitCallstack`.
DecodeCallstackPtrFast() may be called outside the symbol processing thread,
for example in the crash handler. Using the less-capable dladdr functionality
doesn't have a big impact here. Callstack decoding in this context is used to
remove the uninteresting top part of the callstack, so that the callstack ends
at the crashing function, and not in the crash handler. Even if this
functionality would be impacted by this change, the damage done is close to
none.
The other alternative is to use locking each time a libbacktrace is to be
used, which does not seem to be worthy to do, considering that the problem
only occurs in a very rare code path.
NB everything was working when it was first implemented, because back then the
callstack decoding was still performed on the main thread, and not on a
separate, dedicated one.