Compiling The Machinery with Emscripten: Part 1

At Our Machinery, the last Friday of every month is Hack Day. This means that you can work on whatever you like and try wild ideas without any expectation of being productive, useful, or sane.

For my Hack Days, I like to pick projects that let me explore something new and have some sort of tangible end goal. It’s just more fun when you have something to show off in the end. That’s what steered me towards game development in the first place.

For the last two Hack Days, I’ve been working on compiling The Machinery with Emscripten so that it can run in a web browser. I had never used Emscripten before so it definitely fits the bill of trying something new. As for goals, for the first day, my goal was to get our command-line tools to run and print some output in the JavaScript console. For the second day, I wanted to tackle Simple Draw — a small drawing program built in our UI — and see if I could get it to run in the browser:

Simple Draw

Simple Draw

In this post, I’ll go through the first part — getting a basic command-line program up and running. In the next post, we’ll work on the UI.

Setting up Emscripten

The first step was to install and set up Emscripten on my computer (an M1 MacBook). This was very straightforward. I just followed the Getting Started instructions from the Emscripten web site.

After downloading, you use Emscripten by running source emsdk_env.sh in a command shell. This sets up the PATH environment variable so that you have access to the Emscripten compiler commands (emcc, emmake, etc). I verified the setup by compiling a simple *Hello World-*program.

When you compile a program using Emscripten, you end up with three files: a .wasm file that contains the compiled code in WASM format, a .js file that loads the WASM code and interfaces with it, and a .html file for running the program in the browser. However, if just open this HTML file, you will be greeted by this disappointing progress spinner:

Something went wrong…

Something went wrong…

This happens because, for security reasons, most browsers don’t let you load WASM from the file system. If you open the browser’s JavaScript developer console, you will see a detailed error message about this. (It’s a good idea to always have the console open when you’re working with Emscripten.)

To get by the security checks, you have to load the file from a real web server. You could upload it to a remote web server somewhere, but for faster iterations, you’ll want to run a local web server. There are many ways of doing this, I usually use Python’s http.server module, because I already have Python installed and it’s just a one-line command:

$ python3 -m http.server
Serving HTTP on :: port 8000 (http://[::]:8000/) ...

This starts a local web server on port 8000, serving the files from the directory where the command is run. We can go to http://localhost:8000/ in the web browser and click on the .html file to run it. This is what we will see:

Hello, world!

Hello, world!

The layout here is a bit confusing, but at the top is a small Canvas where our program can draw stuff (we’ll look into that in the next part) and below is a big Textarea where anything we print to stdout will appear. The Fullscreen button doesn’t actually work for me in any browser where I tried it. (Not sure if I just happened to pull a bad version of Emscripten or if something else is going on. Again, this was just a single Hack Day, so not much time to investigate.)

If you’re working with graphics, the small canvas in this view is pretty annoying. Luckily, it’s pretty easy to change. You can just open the HTML file and change the size of the canvas. Or you can create a copy of the HTML file with different layout. The advantage of that is that the copy won’t be overwritten the next time you compile.

Build system

So I got some code to compile. The next step was to get some of The Machinery’s code to compile. For that, I needed to get the Emscripten build integrated into our build system.

Our build system has two main parts. We use Premake to generate build files for make, Visual Studio, etc and then we have a custom in-house tool called tmbuild that downloads dependencies (including Premake), generates projects (using Premake), builds, and runs unit tests.

For Premake we need to add Emscripten as a new platform. The platforms in Premake don’t really mean anything. They’re just tags to group settings under. We can add a new platform like this:

filter { "system:macosx" }
    platforms { "MacOSX-x64", "MacOSX-ARM", "web" }

This specifies that when Premake is running on OS X, in addition to the two target platforms MacOSX-x64 and MacOSX-ARM, there will be an additional one, called web. This is our Emscripten target platform.

With the platform defined, we can proceed to configure it:

filter { "platforms:web" }
    defines { "TM_OS_WEB", "TM_OS_POSIX", "TM_NO_MAIN_FIBER" }
    targetextension ".html"

This sets up some defines that we can check for in the code. It also says that we want to produce HTML files as output. (I’m only showing some of the settings here, we also have settings for warnings, etc.)

Core functionality in The Machinery (OS access, math, arrays, etc) is provided by the foundation library. To get the command-line tools running, we need to compile this library, as well as the tools themselves.

By default, Premake compiles all the projects listed in the premake5.lua file. So instead, of turning web compilation on for the projects we won’t, we have to turn it off for the projects we don’t want. We can do it by adding this line to the project definition:

removeplatforms { “web” }

This prevents the project from being compiled for the web platform.

With Premake fixed, the next step is to add support for web compiles to tmbuild. tmbuild already had basic support for cross-platform builds, it just needed an option to activate it, so I added --platform web.

I also needed to tell tmbuild how to build web projects, which was just a single line of code:

tm_temp_allocator_api->printf(ta, "emmake make %s config=%s_%s -j %u", project, clower, platform_name, num_threads)

Now we can build for Emscripten with tmbuild --platform web. Somewhat predictably, this gives us a bunch of build errors.

Build errors

Emscripten is a bit picker than our regular Clang builds. Kind of weird, because Emscripten is based on Clang and we’re using the same build flags. The main things I noticed were:

  • Emscripten seems to use stronger analysis of unused variables. It found some unused variables that didn’t trigger errors before.

  • Emscripten warns about void f() declarations. In C, void f() is not a function without arguments, it’s a function that takes an unspecified number of arguments. A function without arguments has to be written as void f(void) This is an easy mistake to make, especially since C++ changed this — in C++ void f() is a function without arguments.

Fixing these things was pretty straightforward and I’m actually glad that Emscripten generated these warnings. I’d like to see if I can enable them for our regular builds too.

A more problematic bunch of errors came from Emscripten using 32-bit pointers. Going into this port, I already knew that Emscripten only supported 4 GB of addressable memory, but I figured that wouldn’t be a big deal, since our small test programs wouldn’t need much (though we might have to rethink some of our Virtual Memory Tricks). However, I didn’t realize that this also meant that Emscripten was using 32-bit pointers. (It would be possible to have one without the other. For example, many 64-bit systems do not actually support a full 64-bit address range.)

This creates some issues because The Machinery doesn’t have a 32-bit path. By now, even cheap phones have 64 bits and we didn’t think maintaining a 32-bit build was worth it (or a big-endian build for that matter).

Luckily, we’re not that dependent on pointer sizes and so far, this has mainly shown up in two ways:

  1. We explicitly pad structs in The Machinery to ensure all padding bytes are zero-initialized. Changing the pointer size changes the amount of padding that’s needed and this generates a lot of warnings.
  2. Our hash table implementation only works with 64-bit keys. Using 32-bit pointers as hash keys generates an error.

For (1), I simply turned off the padding warning for web builds. This is not the correct fix, mind you, because the whole reason we do explicit padding is to ensure the padding bytes get zeroed by designated initializers so that we can do memcmp() and hashing with predictable results. But hey, this is a Hack Day project and there is only so much I can get done in one day.

Handling (2) is trickier. I spent quite some time looking for an Emscripten flag that would force pointer sizes to 64-bits (while still only actually using the lower 32-bits). It seems to me that a flag like that would be really helpful in porting 64-bit projects. It also seems like something that would be pretty easy to implement in the compiler (not that I would now, I haven’t dived into the Clang codebase). But my hopes were smashed, there is no such flag and I’d have to do the hard work myself.

I could have implemented support for 32-bit keys in our hash table implementation. That would probably be the right thing to do, but hey, this is a Hack Day, gotta make something happen in 8 hours. So instead, I used a union to force the key types to 64-bits.

So code like this:

struct TM_HASH_T(void *, uint64_t) ptr_to_bytes

bytes = tm_hash_get(&mt->ptr_to_bytes, ptr);

Had to be rewritten to:

typedef union
{
    void *ptr;
    uint64_t u64;
} void_ptr_t;
struct TM_HASH_T(void_ptr_t, uint64_t) ptr_to_bytes;

bytes = tm_hash_get(&mt->ptr_to_bytes, (void_ptr_t){ptr});

Pretty tedious — but luckily there weren’t too many instances of this in the codebase.

os.h

The final thing I needed to do was to implement the OS backend code for accessing files, managing threads, etc.

In The Machinery, almost all OS-specific code (except for rendering and windowing) is exposed through a single header: os.h. It has a platform-independent interface that gets implemented by platform-specific implementation files: os.win32.c, os.linux.c, os.osx.c. To support a new platform, we need to add an implementation file for that platform. Luckily, Emscripten supports mostly the same interfaces as Linux, so I was able to just copy os.linux.c to os.web.c.

Result

And there we go. Our foundation library now compiles and so do all of our command-line tools: tmbuild, docgen, hash, etc.

Let’s have a look:

hash.exe help output in the web browser.

hash.exe help output in the web browser.

Here, I’ve hardcoded hash to run with --help to print the help text to stdout.

Not terribly exciting, but pretty cool. We have a basic part of The Machinery running in the web browser. In the next part, we’ll spice it up with some interactivity.

by Niklas Gray