mirror of
https://github.com/boostorg/openmethod.git
synced 2026-01-22 05:22:26 +00:00
108 lines
3.1 KiB
Plaintext
108 lines
3.1 KiB
Plaintext
|
|
[#performance]
|
|
|
|
Open-methods can be as fast as ordinary virtual member functions when
|
|
compiled with optimization.
|
|
|
|
First, let's examine the code generated by clang for an ordinary virtual
|
|
function call:
|
|
|
|
[source,c++]
|
|
----
|
|
void call_virtual_function(const Node& node, std::ostream& os) {
|
|
node.postfix(os);
|
|
}
|
|
----
|
|
|
|
Clang compiles this function to the following assembly on the x64 architecture:
|
|
|
|
[source,asm]
|
|
----
|
|
mov rax, qword ptr [rdi]
|
|
mov rax, qword ptr [rax + 24]
|
|
jmp rax # TAILCALL
|
|
----
|
|
|
|
llvm-mca estimates this code has a throughput of 1 cycle per dispatch.
|
|
|
|
Let's look at a method call now:
|
|
|
|
[source,c++]
|
|
----
|
|
void call_via_ref(const Node& node, std::ostream& os) {
|
|
postfix(node, os);
|
|
}
|
|
----
|
|
|
|
This compiles to (variable names are shortened for readability):
|
|
|
|
[source,asm]
|
|
----
|
|
mov rax, rdi
|
|
mov rcx, qword ptr [rdi]
|
|
mov rdi, qword ptr [rip + mult]
|
|
imul rdi, qword ptr [rcx - 8]
|
|
movzx ecx, byte ptr [rip + shift]
|
|
shr rdi, cl
|
|
mov rdx, rsi
|
|
mov rcx, qword ptr [rip + vptr_vector_vptrs]
|
|
mov rdi, qword ptr [rcx + 8*rdi]
|
|
mov rcx, qword ptr [rip + postfix::fn+88]
|
|
mov rcx, qword ptr [rdi + 8*rcx]
|
|
mov rsi, rax
|
|
jmp rcx # TAILCALL
|
|
----
|
|
|
|
This is quite a few instructions more. Upon closer examination, we see that many
|
|
are memory reads, independent of one another; they can thus be executed in
|
|
parallel. For example, the first three instructions can execute simultaneously.
|
|
|
|
llvm-mca estimates a throughput of 4 cycles per dispatch. However, the
|
|
difference is amortized by the time spent passing the arguments and returning
|
|
from the function; plus, of course, executing the body of the function.
|
|
|
|
Micro- and RDTSC-based benchmarks suggest that dispatching an open-methods with
|
|
a single virtual argument _via_ _a_ _reference_ is between 30% and 50% slower
|
|
than calling the equivalent virtual function, with an empty body and no other
|
|
arguments. In most real programs, the overhead would be unnoticeable.
|
|
|
|
*However*, `call_via_ref` does two things: it constructs a `virtual_ptr<Node>`
|
|
from a `const Node&`, then it calls the method.
|
|
|
|
The construction of the `virtual_ptr` is the costly part. It performs a lookup
|
|
in a perfect hash table, indexed by pointers to `std::type_info`, to find the
|
|
correct vtable. Then it stores a pointer to it in the `virtual_ptr` object,
|
|
along with a pointer to the object.footnote:[This is how Go and Rust implement
|
|
dynamic dispatch.]
|
|
|
|
If we already have a `virtual_ptr`:
|
|
|
|
[source,c++]
|
|
----
|
|
void call_via_virtual_ptr(virtual_ptr<const Node> node, std::ostream& os) {
|
|
postfix(node, os);
|
|
}
|
|
----
|
|
|
|
A method call compiles to:
|
|
|
|
[source,asm]
|
|
----
|
|
mov rax, qword ptr [rip + postfix::fn+88]
|
|
mov rax, qword ptr [rdi + 8*rax]
|
|
jmp rax # TAILCALL
|
|
----
|
|
|
|
`virtual_ptr` arguments are passed through the method call, to the overrider,
|
|
which can use them to make further method calls.
|
|
|
|
Code that incorporates open-methods in its design should use
|
|
`virtual_ptr`{empty}s in place of plain pointers or references, as much as
|
|
possible. Here is the Node example, rewritten to use `virtual_ptr`{empty}s
|
|
thoughout:
|
|
|
|
[source,c++]
|
|
----
|
|
include::{examplesdir}/ast_virtual_ptr.cpp[tag=content]
|
|
----
|