Little Machines Working Together (Part 1)

May 16, 2017

What we are building at Our Machinery is not so much a single monolithic thing as a collection of little things that can be configured and assembled in various ways in order to create many wondrous things (or so we hope!). This is why we refer to the project as The Machinery — it’s a lot of little machines, all working together.

A key part of this architecture is how all these little things connect and co-operate. We want it to be really easy to plug different components in-and-out, even hot-reloading them. In this post I’ll show how that system is implemented.

Finding friends

Consider a lonely little component, living all by itself, in its own DLL file. It only knows how to do one thing, but it does it really well. Say for example that it is really good at middle-out compression. Somehow, this component needs to tell the rest of the system that it exists. It also needs to expose an API for other components to make use of its services.

Let’s tackle the second part first. The most traditional way of exposing an API is through a header file with some type definitions, constants and global functions. For example, zlib exposes:

int deflate(z_streamp strm, int flush);

Since we want our system to support dynamic linking we can’t straight up static link with a library of global functions like deflate(). Instead, we need to work with function pointers, something like:

deflate_f *deflate = (deflate_f *)GetProcAddress(zlib_dll, "deflate");

Dealing with a lot of individual function pointers like this quickly becomes messy. So to keep things organized, we put all the function pointers that belong to a specific API in a struct.

This is what the struct might look like for our hypothetical compression component (for simplicity, let’s assume it compresses whole buffers instead of using a stream based protocol):

struct tm_piper_compression_i {
    void (*compress)(uint8_t *compressed, const uint8_t *raw, uint64_t raw_size);
    void (*decompress)(uint8_t *raw, const uint8_t *compressed, uint64_t raw_size);
};

Putting the function pointers in a struct like this is good for multiple reasons:

It clearly identifies functions that belong together and are a part of the same API.
It makes it easier to pass the API around, we only need to pass around a struct tm_piper_compression_i * instead of a bunch of individual functions. Whoever has a pointer to the struct can use the whole API.
It allows us to use shorter function names. If we had to come up with function names guaranteed to be globally unique throughout the application they would either have to be really long ( tm_piper_compression_compress), or use some kind of abbreviation scheme (tm_pc_compress). Neither option is super appealing. The struct becomes a C way of getting something like a C++ namespace for the API.

Speaking of that elephant in the room — how come we use C and not C++ for the APIs?

First off, C has a standard ABI (in practice) and C++ doesn’t. What this means is that in order to be able to link C++ object files they have to be compiled by the same compiler, with the same version of that compiler and using the same compile flags. This is a huge pain in the ass. It means that if I want to make a plugin that you can use, I can’t just make one DLL and give it to you — I have to make a DLL for each compiler you might use, for each version and each configuration. That’s pretty much a no-go right there for a flexible and easy-to-use plugin system.

There are a bunch of other reasons why we prefer C for the APIs and would probably continue to use it even if the efforts to standardize a C++ ABI came to fruition:

Simplicity — C is a much simpler language than C++. Using C makes our APIs simpler and easier to understand. Note that by simple I don’t mean the same thing as easy to use — C++ APIs can be very easy to use, easier than C in many cases. But learning C++ and truly understanding it, is a much harder task than learning C. Being easy to understand is more important than being easy to use.
Smaller design space — C++ is a huge multi-paradigm language. That means APIs can be written in a lot of different “styles”. Just to mention one aspect, operations can be written either as class methods or as free functions. This “expressional power” may seem like a good thing, but the drawback is that it tends to fragment the code base and create confusion, with different styles being used in different parts of a project, unless you are very careful with reigning the designs in.
More decoupling — C++ tends to create solutions with lots of couplings between objects. Objects are composed out of other objects, etc. This doesn’t fit well with our vision of a lot of independent machines co-operating.

Note that we only enforce C in the interface layer. Implementations can be written in either C or C++. In interfaces, simplicity and consistency is important, because each interface will be used by a lot of people. The implementations contain the bulk of the code, so here expressional power can be useful to reduce the code size. And an implementation only needs to be understood by the people working on that piece of code, so simplicity isn’t as important — it’s still important though. That’s why we think it makes sense sometimes to use C++ for the implementations.

Of course, C vs C++ can be discussed endlessly. No time for that if we actually want to build something, so let’s get back to the first problem. How can a component tell other parts of the system that it exists?

For this, we have a central component called the api_registry. It simply keeps track of all the loaded components/APIs in the system:

struct tm_api_registry_i
{
    void (*add)(const char *name, void *interf);
    void (*remove)(void *interf);
    void *(*first)(const char *name);
    void *(*next)(void *prev);
};

As you can see, each component is identified by a unique const char *name. The name is defined in the header file together with the API:

#define TM_PIPER_COMPRESSION_API_NAME "tm_piper_compression_i"

struct tm_piper_compression_i {
    void (*compress)(uint8_t *compressed, const uint8_t *raw, uint64_t raw_size);
    void (*decompress)(uint8_t *raw, const uint8_t *compressed, uint64_t raw_size);
};

Thus, the header contains all the information a client needs to get the API from the registry and start using it. It would look something like this:

struct tm_piper_compression_i *pc = (struct tm_piper_compression_i *)api_registry->first(TM_PIPER_COMPRESSION_API_NAME);

pc->compress(...);

There is typically other stuff in the header too, such as documentation, which I’m omitting here for brevity.

As you can see from the first() and next() functions in the API registry, there can be multiple implementations of the same interface. For example, for unit tests we define a struct tm_unit_test_i API. To run unit tests we query the API registry for all TM_UNIT_TEST_API_NAME modules, run the unit tests on them and then display the result in some interesting way.

Note that the API registry itself doesn’t provide any mechanism to distinguish between different modules that implement the same API. If we need that functionality, we need to add it to the API itself. For example, the unit test API has a const char *name() function that returns the name of the test and can be used to run tests selectively.

In my next post, I’ll show how to build a plugin system out of these parts.

Finding friends

by Niklas Gray