Virtualization need-to-knows

Let's start this blogpost with a baseline. Although this might be applicable to languages other than C++. In fact this might be applicable to all object oriented languages, we will only cover the specifics in C++. Even in C++ compilers actually have the freedom to implement virtualization as they see fit because C++ doesn't specify an implementation. Generally speaking though every compiler implements virtualization with the help of a "vtable". A simple lookup table to find the function address of the virtual function.

Note: These implementations tend to get very complex once they start supporting multiple inheritance and virtual inheritance.

In its most simple form we have a base class with some virtual functions in there and we have a derived class that implements/overwrites these functions. The "vtable" helps the program to point at the right function. This mostly happens at runtime which has some consequences. This article will show you some ways to avoid this runtime overhead with giving the compiler enough information at compile time to devirtualize these functions. These optimizations might seem as micro-optimizations, but it's generally good to be aware of those things and in the core gameloop or in critical places these can help performance a great deal. I also will link to some articles I found scattered around the internet since the resources on this are minimal.

class A 
     virtual void foo();
class B : A 
    void foo() override;

B* b = new B();
Introduction into de-virtualization

I will show some easy tricks for us as developers and I will also go into some more experimental techniques to make sure we either discard as much polymorphic calls as possible or make sure that the compiler optimizes them into direct calls.

keyword final

Compilers have come a very long way and might sometimes look like black magic, but there isn't any magic involved, just a lot of very clever people. So, if the compiler CAN determine that at runtime the method you are trying to invoke will always be from the same class(meaning it can't come from any subclass). It will optimize the the program method directly. That's a nice and abstract way of saying the keyword final can help the compiler in determining that there is no "child" classes or "child" methods. You can use the keyword final on a class level but also on a method level. This can be seen here and the example is also directly taken from there:

struct A 
     virtual void f();
struct B final : A 
void f(B *b)
Pointer optimization

There are also other way of achieving this optimization by directly telling the compiler the exact function the program needs to use. This is a more direct way and might come to bite you in the butt another day. But if you feel confident you can also do this with pointers:

Stack objects are optimized from the get-go

if you have a stackobject there is no need to do that, since the compiler already knows it's no "child" class/method and this will be determined at compile time:;
doom3 uses a combination of the above

This example came from here and might not work on every compiler out there but most compilers should be able to see the direct link to that stackobject.

idCommonLocal    commonLocal;           
idCommon *       common = &commonLocal;

Note: Here specifically, there are also other way by having other include paths for every OS, you could also do something similar like UE4 with typedefs, but that's beside the point for this article.

Experimental idea: trading vtable metod calls in favor of branching and a bit of memory

Now let's go into a more experimental approaches I could think off. I found this quite interesting and tested it out some time ago, with some success. Let's start with the warnings right at the beginning, branches are also somewhat expensive and this will only work in certain scenario's. The scenario I will talk about is for ECS systems. Every system inherits from a BaseSystem class and has an Update, FixedUpdate, Render, etc. All these functions can be overwritten and when they are not they fall back to their standard base class method which is an empty function. This however is something the compiler has to do at runtime and this basically costs:

  1. get the vtable in memory
  2. apply the offset to get that certain function in the vtable
  3. jump to the actual method that is empty.

Note: In for-loops when iterating over the same types of system the compiler might try to optimize and ofset the this-pointer in specific cases.

In general though this is more costly then the branching in a simple if-loop. Some downsides worth noting is maintainability and memory cost.

Some last tips

In the constructor or destructor we are sure which object it is and the compiler can use the provided vtable directly. I would also like to note that you should test these things and not trust your intuition. Different compilers might result in different results. Try to come up with minimum viable tests and measure timings or look at the disassembly result. Here is a good way to do test different compilers/minimum viable tests. There are also a lot of compiler settings that can help de-virtualize, but also inlining and using templates in certain ways. Those all have their own advantages and disadvantages and I might write about them another time.

That's it for now, I will be updating other parts of this site and hopefully write some other blogposts in the near future. I hope you had a nice read and learned a thing or two.