Files
openmethod/doc/performance.adoc
Jean-Louis Leroy 5e0fa8ee4b inception
2025-03-08 15:31:25 -05:00

103 lines
3.3 KiB
Plaintext

## Performance
Open-methods are almost as fast as ordinary virtual member functions when
compiled with optimization.
clang compiles the following code:
[source,c++]
----
include::{examplesdir}/hello_world.cpp[tag=call_poke_via_ref]
----
...to this on the x64 architecture (variable names have been shortened for
readability):
[source,asm]
----
mov rax, qword ptr [rsi]
mov rdx, qword ptr [rip + hash_mult]
imul rdx, qword ptr [rax - 8]
movzx ecx, byte ptr [rip + hash_shift]
shr rdx, cl
mov rax, qword ptr [rip + vptrs]
mov rax, qword ptr [rax + 8*rdx]
mov rcx, qword ptr [rip + poke::slots_strides]
mov rax, qword ptr [rax + 8*rcx]
jmp rax
----
llvm-mca estimates a throughput of 4 cycles per dispatch. Comparatively, calling
a native virtual functions takes one cycle. However, the difference is amortized
by the time spent passing the arguments and returning from the function; plus,
of course, executing the body of the function.
Micro benchmarks suggest that dispatching an open-methods with a single virtual
argument is between 30% and 50% slower than calling the equivalent virtual
function, with an empty body and no other arguments.
However, `call_poke` does two things: it constructs a `virtual_ptr<Animal>` from
an `Animal&`; and then it calls the method. The construction of the
`virtual_ptr` is the costly part, as it involves a hash table lookup. Once that
price has been paid, the `virtual_ptr` can be used multiple times. It is passed
to the overrider, which can make further method calls through it. It can be
stored in variables in place of plain pointers.
Let's look at another example: an AST for an arithmetic calculator:
[source,c++]
----
include::{examplesdir}/ast.cpp[tag=ast]
----
The `Negate` overrider compiles to:
[source,asm]
----
mov rdi, qword ptr [rsi + 8]
mov rsi, qword ptr [rsi + 16]
mov rax, qword ptr [rip + value::slots_strides]
call qword ptr [rdi + 8*rax]
neg eax
pop rcx
----
The first two instructions read the `virtual_ptr` from `this` - placing its
content in registers `rdi` and `rsi`.
The next two instructions are the method call proper. According to llvm-mca,
they take one cycle - the same as a native virtual function call.
When we create the `Plus` and `Negate` nodes, we call the conversion
constructors of `virtual_ptr<Node>`, which occur the cost of hash table lookups.
However, in this example, we know the exact types of the objects. In that case,
we can use `final_virtual_ptr` to construct the `virtual_ptr` using a single
instruction. For example:
[source,c++]
----
include::{examplesdir}/ast.cpp[tag=final,indent=0]
----
...compiles to:
```asm
;; construct Literal
lea rax, [rip + vtable for Literal+16]
mov qword ptr [rsp], rax
mov dword ptr [rsp+8], 1
;; construct Negate
mov rax, qword ptr [rip+static_vptr<Literal>] ; address of openmethod v-table
lea rcx, [rip+vtable for Negate+16] ; address of native v-table
mov qword ptr [rsp+16], rcx ; set native v-table
mov qword ptr [rsp+24], rax ; set openmethod v-table
mov rax, rsp ; address of 'one'
mov qword ptr [rsp+32], rax ; set vptr object pointer to 'one'
```
`final_virtual_ptr` does not require its argument to have a polymorphic type.