Files
openmethod/doc/modules/ROOT/pages/performance.adoc
2025-11-29 21:47:09 -05:00

108 lines
3.1 KiB
Plaintext

[#performance]
Open-methods can be as fast as ordinary virtual member functions when
compiled with optimization.
First, let's examine the code generated by clang for an ordinary virtual
function call:
[source,c++]
----
void call_virtual_function(const Node& node, std::ostream& os) {
node.postfix(os);
}
----
Clang compiles this function to the following assembly on the x64 architecture:
[source,asm]
----
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax + 24]
jmp rax # TAILCALL
----
llvm-mca estimates this code has a throughput of 1 cycle per dispatch.
Let's look at a method call now:
[source,c++]
----
void call_via_ref(const Node& node, std::ostream& os) {
postfix(node, os);
}
----
This compiles to (variable names are shortened for readability):
[source,asm]
----
mov rax, rdi
mov rcx, qword ptr [rdi]
mov rdi, qword ptr [rip + mult]
imul rdi, qword ptr [rcx - 8]
movzx ecx, byte ptr [rip + shift]
shr rdi, cl
mov rdx, rsi
mov rcx, qword ptr [rip + vptr_vector_vptrs]
mov rdi, qword ptr [rcx + 8*rdi]
mov rcx, qword ptr [rip + postfix::fn+88]
mov rcx, qword ptr [rdi + 8*rcx]
mov rsi, rax
jmp rcx # TAILCALL
----
This is quite a few instructions more. Upon closer examination, we see that many
are memory reads, independent of one another; they can thus be executed in
parallel. For example, the first three instructions can execute simultaneously.
llvm-mca estimates a throughput of 4 cycles per dispatch. However, the
difference is amortized by the time spent passing the arguments and returning
from the function; plus, of course, executing the body of the function.
Micro- and RDTSC-based benchmarks suggest that dispatching an open-methods with
a single virtual argument _via_ _a_ _reference_ is between 30% and 50% slower
than calling the equivalent virtual function, with an empty body and no other
arguments. In most real programs, the overhead would be unnoticeable.
*However*, `call_via_ref` does two things: it constructs a `virtual_ptr<Node>`
from a `const Node&`, then it calls the method.
The construction of the `virtual_ptr` is the costly part. It performs a lookup
in a perfect hash table, indexed by pointers to `std::type_info`, to find the
correct vtable. Then it stores a pointer to it in the `virtual_ptr` object,
along with a pointer to the object.footnote:[This is how Go and Rust implement
dynamic dispatch.]
If we already have a `virtual_ptr`:
[source,c++]
----
void call_via_virtual_ptr(virtual_ptr<const Node> node, std::ostream& os) {
postfix(node, os);
}
----
A method call compiles to:
[source,asm]
----
mov rax, qword ptr [rip + postfix::fn+88]
mov rax, qword ptr [rdi + 8*rax]
jmp rax # TAILCALL
----
`virtual_ptr` arguments are passed through the method call, to the overrider,
which can use them to make further method calls.
Code that incorporates open-methods in its design should use
`virtual_ptr`{empty}s in place of plain pointers or references, as much as
possible. Here is the Node example, rewritten to use `virtual_ptr`{empty}s
thoughout:
[source,c++]
----
include::{examplesdir}/ast_virtual_ptr.cpp[tag=content]
----