mirror of
https://github.com/boostorg/openmethod.git
synced 2026-01-27 07:02:11 +00:00
103 lines
3.3 KiB
Plaintext
103 lines
3.3 KiB
Plaintext
|
|
## Performance
|
|
|
|
Open-methods are almost as fast as ordinary virtual member functions when
|
|
compiled with optimization.
|
|
|
|
clang compiles the following code:
|
|
|
|
[source,c++]
|
|
----
|
|
include::{examplesdir}/hello_world.cpp[tag=call_poke_via_ref]
|
|
----
|
|
|
|
...to this on the x64 architecture (variable names have been shortened for
|
|
readability):
|
|
|
|
[source,asm]
|
|
----
|
|
mov rax, qword ptr [rsi]
|
|
mov rdx, qword ptr [rip + hash_mult]
|
|
imul rdx, qword ptr [rax - 8]
|
|
movzx ecx, byte ptr [rip + hash_shift]
|
|
shr rdx, cl
|
|
mov rax, qword ptr [rip + vptrs]
|
|
mov rax, qword ptr [rax + 8*rdx]
|
|
mov rcx, qword ptr [rip + poke::slots_strides]
|
|
mov rax, qword ptr [rax + 8*rcx]
|
|
jmp rax
|
|
----
|
|
|
|
llvm-mca estimates a throughput of 4 cycles per dispatch. Comparatively, calling
|
|
a native virtual functions takes one cycle. However, the difference is amortized
|
|
by the time spent passing the arguments and returning from the function; plus,
|
|
of course, executing the body of the function.
|
|
|
|
Micro benchmarks suggest that dispatching an open-methods with a single virtual
|
|
argument is between 30% and 50% slower than calling the equivalent virtual
|
|
function, with an empty body and no other arguments.
|
|
|
|
However, `call_poke` does two things: it constructs a `virtual_ptr<Animal>` from
|
|
an `Animal&`; and then it calls the method. The construction of the
|
|
`virtual_ptr` is the costly part, as it involves a hash table lookup. Once that
|
|
price has been paid, the `virtual_ptr` can be used multiple times. It is passed
|
|
to the overrider, which can make further method calls through it. It can be
|
|
stored in variables in place of plain pointers.
|
|
|
|
Let's look at another example: an AST for an arithmetic calculator:
|
|
|
|
[source,c++]
|
|
----
|
|
include::{examplesdir}/ast.cpp[tag=ast]
|
|
----
|
|
|
|
The `Negate` overrider compiles to:
|
|
|
|
[source,asm]
|
|
----
|
|
mov rdi, qword ptr [rsi + 8]
|
|
mov rsi, qword ptr [rsi + 16]
|
|
|
|
mov rax, qword ptr [rip + value::slots_strides]
|
|
call qword ptr [rdi + 8*rax]
|
|
|
|
neg eax
|
|
pop rcx
|
|
----
|
|
|
|
The first two instructions read the `virtual_ptr` from `this` - placing its
|
|
content in registers `rdi` and `rsi`.
|
|
|
|
The next two instructions are the method call proper. According to llvm-mca,
|
|
they take one cycle - the same as a native virtual function call.
|
|
|
|
When we create the `Plus` and `Negate` nodes, we call the conversion
|
|
constructors of `virtual_ptr<Node>`, which occur the cost of hash table lookups.
|
|
However, in this example, we know the exact types of the objects. In that case,
|
|
we can use `final_virtual_ptr` to construct the `virtual_ptr` using a single
|
|
instruction. For example:
|
|
|
|
[source,c++]
|
|
----
|
|
include::{examplesdir}/ast.cpp[tag=final,indent=0]
|
|
----
|
|
|
|
...compiles to:
|
|
|
|
```asm
|
|
;; construct Literal
|
|
lea rax, [rip + vtable for Literal+16]
|
|
mov qword ptr [rsp], rax
|
|
mov dword ptr [rsp+8], 1
|
|
|
|
;; construct Negate
|
|
mov rax, qword ptr [rip+static_vptr<Literal>] ; address of openmethod v-table
|
|
lea rcx, [rip+vtable for Negate+16] ; address of native v-table
|
|
mov qword ptr [rsp+16], rcx ; set native v-table
|
|
mov qword ptr [rsp+24], rax ; set openmethod v-table
|
|
mov rax, rsp ; address of 'one'
|
|
mov qword ptr [rsp+32], rax ; set vptr object pointer to 'one'
|
|
```
|
|
|
|
`final_virtual_ptr` does not require its argument to have a polymorphic type.
|