Little Machines Working Together (Part 2)

In the previous post in this series I talked about our API registry that lets parts of our code register interfaces that other parts of the code can query for and call. The main advantage of doing this through a dynamic registry instead of through static linking is that the different modules can be written completely independent of one another, live in separate DLLs, and be loaded and unloaded dynamically. In this part I’ll show how we create a plugin from these parts.

Classes in C

In addition to APIs our headers also contain definitions of objects with state. For example we define an allocator as (simplified here for brevity):

struct tm_allocator_i {
    struct tm_allocator_o *inst;
    void *(*realloc)(struct tm_allocator_o *inst, void *ptr, uint64_t size);

Here, struct tm_allocator_o *inst is an opaque pointer to the state data of an allocator. In the implementation of the allocator we cast the data to whatever structure the allocator uses internally.

The tm_allocator_i struct contains everything a client needs in order to use the allocator — the state and the functions that operate on that state, bundled up in the same struct. Calling the allocator through its API looks something like:

void *my_buffer = allocator->realloc(allocator->inst, NULL, 1024*1024);

Note that different allocator implementations can be completely unrelated. They can live in different DLLs, have completely separate code and state, etc.

Unlike the APIs that I discussed in the previous posts, objects are not registered with the API registry. Instead there are API functions for creating and destroying them. So there might be a function in an API somewhere for creating a heap allocator:

struct tm_heap_i {
    struct tm_allocator_i *(*create_static_heap)(void *buffer, uint64_t size);
    void (*destroy_static_heap)(struct tm_allocator_i *heap);

As another example, a piece of C code implementing a stdlib allocator could look something like this:

static void *stdlib_realloc(struct tm_allocator_o *, void *ptr, uint64_t size)
    void *new_ptr = NULL;
    if (size)
        new_ptr = realloc(ptr, size);
    return new_ptr;

static struct tm_allocator_i stdlib_allocator = {

static tm_allocator_i *get_stdlib_allocator()
    return &stdlib_allocator;

In this case, get_stdlib_allocator() would be a part of some API exposed through the registry. The user would call this function to get access to the standard allocator. Since the stdlib allocator doesn’t use an explicit state, we don’t need the inst parameter and can leave it at NULL.

This approach with state and function pointers in a struct is a kind of “C with classes” approach. You might wonder, if we want classes, why don’t we just use C++? We would get some help from the syntax, better type safety and some performance improvements, such as the ability to use inline functions.

First, as already stated in the previous post, we want a standard ABI, which we can only get with C, and we like the simplicity of C APIs.

Second, these objects don’t act as any old C++ classes. They use purely abstract interfaces and complete implementation hiding. So they are more like what you would get if you consistently used the PIMPL idiom in C++. We like this, because it gives us a lot of physical and logical decoupling. But it adds a bit of verbosity and a performance cost (no inlined functions, for example). Note though that these costs are more a consequence of the design (abstract/PIMPL) than the language choice (C vs C++).


One difference between our approach and standard C++ PIMPL is that we don’t share vtables between objects. Instead, it’s as if each object has a little vtable of its own. With shared vtables, the API would look more like:

struct tm_allocator_obj {
    struct tm_allocator_vtable *vtable;
    struct tm_allocator_o *state;

struct tm_allocator_vtable {
    void *(*realloc)(struct tm_allocator_obj *this, void *ptr, uint64_t size);

Shared vtables are essentially a memory optimization. Instead of having the method pointers in every object, we just store them once and put a pointer to that in the object. The cost is an extra level of indirection which makes things a bit more verbose (unless we have syntactic sugar for using the vtable, as we do in C++).

The shared vtable optimization makes sense when you have lots and lots of objects with the same API — then all those function pointers start to add up. However, if you have lots and lots of things I would argue that the abstract PIMPL design doesn’t make sense anyway. In that case, you don’t want to address your objects individually, you want to process things in bulk. Implementing a “particle” as an “object” will be inefficient no matter how you do it. A particle should just be some data in a SIMD vector or on the GPU.

Our view is that anything that we have lots and lots of should be managed behind the APIs, in raw buffers. The objects that we manage at the API layer are high-level objects that we have relatively few of, which makes it worth it to pay the extra cost of abstraction. For example, in the particle case, the API level would expose a particle manager, but that manager would keep track of individual particles behind the scene — storing them in CPU or GPU buffers as necessary.

Since we have relatively few of these high-level objects and memory is plentiful these days, we don’t really need the vtable optimization and can go for the simpler approach of storing the function pointers directly in each object.

This is very different from the “everything is an object” philosophy of object-oriented design, which is why we talk about it as “data-driven design”. With our approach we pay the cost of abstraction where it makes sense (for high-level objects) and avoid it otherwise.

Forward declarations

In our system, modules make use of other modules through forward declaration. For example, the compression API that I talked about in the last post probably needs to use an allocator to allocate the buffer for the decompressed data (I conveniently skipped that in the previous post). With this, the API might look something like this:

struct tm_allocator_i;

struct tm_piper_compression_i {
    uint8_t *(*compress)(struct tm_allocator_i *a, const uint8_t *raw, uint64_t raw_size);
    uint8_t *(*decompress)(struct tm_allocator_i *a, const uint8_t *compressed, uint64_t raw_size);

Since the allocator interface is only passed as a pointer, we don’t need to include the allocator.h header in our module header. In fact, our header doesn’t need to include any other headers at all. This is nice, because it keeps the compile times down, as I talked about in a previous post. And more importantly, it keeps the complexity down. You can look at a single header file and most of what you need to know in order to understand and use the module is right there in front of you.

We went down this design path because it seemed to be the best way to build a flexible, modular, decoupled system. As an unexpected benefit, I find that I really enjoy programming in this style. The clear and very strict separation between interfaces and implementations makes it easy to focus on a small part of the system without being distracted by the whole, systems can be easily mocked for testing purposes, etc.

Plugging it in

The organization of our header files and the API registry provide most of the foundation for the plugin system. We only need to add a few snippets of code for loading and unloading shared libraries.

A plugin in The Machinery is just a DLL (or whatever shared library representation is used on the platform) that exposes two functions for loading and unloading the plugin:

void load_xxx(struct tm_api_registry_i *reg, bool reload);
void unload_xxx(struct tm_api_registry_i *reg, bool reload);

Here xxx is the name of the plugin. So for our piper_compression plugin, the functions would be called load_piper_compression() and unload_piper_compression().

Having these names be unique for each plugin allows us to link the plugins either statically or dynamically, depending on the kind of executable we are building.

The load_xxx() function uses the API registry to query for any APIs that the plugin needs and then registers the APIs provided by the plugin with the registry. The unload_xxx() function removes the plugin’s modules from the registry.

The plugin loader simply loads a plugin DLL, gets the address to the load_xxx() function through GetProcAddress() or dlsym() and then calls that function to register the plugin.

The reload flag in the API is used for hot-reloading. We don’t require plugins to be hot-reloadable, but we want to give them the option to be. It is up to each plugin to decide whether it makes sense for that plugin to support hot-reloading or not. When hot-reloading of a plugin is enabled, the plugin system will automatically detect changes to the DLL and hot-reload it. So as a developer, you can just recompile the DLL in Visual Studio, or whatever tool you are using, and the plugin will hot-reload and you will see the effects of your changes immediately.

To hot-reload a plugin the manager first calls load_xxx() on the new DLL with the reload flag set and then it calls unload_xxx() on the old DLL with the reload flag set. This gives the plugin a chance to transfer state between the old and new instance of the DLL. Again, it is up to the plugin author to decide how ambitious she wants to be with preserving state. An FTP plugin might for example decide to cancel all ongoing transfers if the plugin is reloaded, or, it could be more ambitious and keep-alive the existing transfers by transferring the data and the responsibility for maintaining them to the new DLL instance.

Users of an API can ask the API registry to be notified whenever that API is loaded or unloaded. At that point they can store the new API pointer and go through any other reload protocol required by the API (restarting FTP transfers if they are auto-cancelled by the reload).

Windows woes

Unfortunately, implementing a hot-reload DLL system on Windows is not trivial, because of Windows file locking. Windows will lock the DLL when you load it — preventing you from overwriting it when you recompile. In addition, the Visual Studio Debugger will lock the PDB when debugging DLLs and keep them locked indefinitely (even if you unload the DLL). Locked PDBs can’t be overwritten either, which prevents you from recompiling. Here are some possible ways of dealing with it:

  • Copy the DLL and the PDB to a temporary directory before loading them. Unfortunately, this doesn’t work if the DLL depends on something in its current directory (such as another DLL).
  • Copy the DLL to a random file name in the same directory before loading it. Unfortunately, this doesn’t work for the PDB, because the absolute path name of the PDB is hardcoded in the DLL during compile, so if we copy the PDB, the debugger will still load the original.
  • Move the old PDB to a random file name as a pre-compile step. This works, because we can move the file even though it is locked and Visual Studio can still use it. Unfortunately, this triggers a recompile of the PDB every time we compile the project, because the system no longer sees a PDB there.
  • Generate a random name for the PDB for each compile through a /pdb flag with a %random% macro in the name. This works pretty well, except the compiler output is not super neat. Instead of nicely matching piper.dll and piper.pdb files, it will have piper.dll and piper.39429834.pdb.
  • Write code to explicitly force Visual Studio to release its lock of the PDB file.

Our options are a bit limited by the fact that we want both the old and the new DLL to be loaded at the same time, in order to do the state transfer. So we can’t unload the old DLL (and thus unlock it) before loading the new one. If we did the state transfer some other way — for example by serializing the state out to a buffer — that restriction would be lifted. But we don’t want to force plugin developers to do state transfer through serialization. (With our system they can still do that if they want to, of course.)

Currently we use the following approach:

  • Copy the DLL to a random name in the same directory before loading it. This prevents file locks on the DLLs.
  • Move the PDB to a random name as a pre-compile step. This preserves the debugger’s use of the PDB and allows us to build a new one.
  • Copy the moved PDB back to the original PDB name as another pre-compile step. This prevents Visual Studio from rebuilding the DLL every time because the PDB is missing.
  • Clean up old random DLLs and PDBs when the executable is launched.

As an example of the hot-reload mechanism, our unit test system supports hot-reload of plugins. If you run the unit tests with the -r flag, they will run in hot reload mode. The unit tester registers a callback with the API registry and whenever a TM_UNIT_TEST_API_NAME API is reloaded — it will rerun the unit tests for that module.

by Niklas Gray