Localization in The Machinery’s UI

One thing we never really addressed back when we were making the Bitsquid engine was localization of the tools. We supported localization of the games made in the engine, of course, but the engine tools themselves were only available in English. As both the number of tools and the number of frameworks used for the tools (Winforms, WPF, HTML, …) grew, the localization task became more daunting. When you’re writing an engine, you’re never short of stuff to do, and there always seemed to be something more pressing than addressing localization.

With The Machinery, we didn’t want to make the same sharp distinction between “tools” and “games” as we envision lots of use cases where the creative experience is taken all the way to the end user. So it was important to get this right from the start.

The scope of the problem

“Localization” in the context of game development can mean many different things. Localization of strings in the UI is probably what first comes to mind, but language can also be found in dialogue audio files and text baked into textures. Finally, there can be legal and cultural issues. For example, localization to Germany requires removing blood and gore. Plot or animation changes might also be necessary.

In the Bitsquid engine, all these things were all handled in the same generalized localization framework. I.e., everything was a “resource” (string resource, audio resource, texture resource, script resource) and we could localize these resources for different targets.

Treating everything the same has the advantage of being conceptually simple, but it also has disadvantages. Data-oriented design tells us to “look at the data” and the data for these different cases looks completely different. The UI text typically only amounts to a couple of KB for an entire game, whereas the audio dialogue can easily end up in the GB range. Solutions that work well in one case might not work well for the other.

For example, we can easily keep the UI text for all the supported languages permanently in memory and provide instant switching between different languages. For dialogue sounds, on the other hand, switching language might require unloading and loading megabytes of data — a significantly slower and more complicated procedure. If we force everything in under the same “localization abstraction”, we might force the UI text localization to become as slow and complicated as the other kinds of localization, even though that is completely unnecessary.

So in The Machinery, we treat these as separate problems and do not try to solve them all using the same mechanism. In this article, I will only look at the specific problem of localizing UI text.

Basic approach

The Machinery uses an immediate mode GUI approach (IMGUI), where the UI is built in code rather than in a UI editor. The advantages and disadvantages of this could be the topic of a whole another post. What it means, though, is that since all the UI text is in the C code, what we need for localization is a way of localizing strings in C code.

I.e. we need a function that given some key key that identifies a string in the UI, returns a translation of that string in the user’s currently selected interface language:

localize(key) → text

In The Machinery, this function is actually implemented as a macro TM_LOCALIZE(key), so I will use capital letters to refer to it from now on.

I’ll go into details on how TM_LOCALIZE is implemented later, but for now, just think of it as having some kind of table that provides the interface text for each key in all the supported languages and using the key to do a lookup in that table:

Key English French Swedish
key_file_menu File Fichier Arkiv
key_save Save Enregistrer Spara

So far, I’ve been deliberately vague about what the key that we use to index into this table actually is. One common option is to use an integer identifier for the key and have a big header file that enumerates all the possible key values. Something like this:

#define TM_STRING_ID__FILE_MENU 0x0001
#define TM_STRING_ID__SAVE 0x0002
...

However, I find that this approach has a number of problems:

  • It is a little bit painful to go in and edit this header file for every string in the UI, and the header can get pretty large.

  • The Machinery is built as a collection of DLLs. To use this approach we would have to make sure that the IDs did not collide across DLLs, for example by giving each DLL a unique numerical prefix for its IDs. But then we also need a mechanism to ensure that these prefixes never collide.

  • Sequential numbering like this tends to cause a lot of merge conflicts in version control tools. Any programmer who wants to add new IDs needs to put them at the end of the list where they will conflict with new IDs from other programmers as soon as they merge.

  • IDs that are no longer used will still remain in the index as “ghosts” or “holes” unless you reuse them for new strings, but then you have the problem that the IDs now has different meanings depending on the version of the software.

  • Because it is a little bit painful to create new IDs, there is a temptation to keep the ID even if you change the UI text. For example, suppose you have a tool in your UI for moving objects around called Relocate. So you create a key ID_RELOCATE and associate it with the string "Relocate". Later, you decide it is simpler and better to just call the tool "Move". It is tempting to do this by just changing the text in the translation table and leaving the identifier as it is, but then you end up with the somewhat confusing situation where ID_RELOCATE actually refers to the string "Move" rather than "Relocate".

What can we do instead? Whenever we need unique identifiers but sequential enumeration is problematic, hashing is a good approach. So instead of:

TM_LOCALIZE(TM_STRING_ID__FILE_MENU)

We could do:

TM_LOCALIZE(hash("TM_STRING_ID__FILE_MENU"))

This gets rid of several of the pain points. We don’t need the big header file enumerating all the IDs. There is no risk of collision (if we hash into a big enough key space). And we don’t have any ghosts or holes any more (since we don’t have the big list anymore).

Isn’t hashing expensive compared to using a sequential ID? True, hashing has an extra cost, but in this case, it doesn’t really matter. In an IMGUI you should only be drawing the things that end up on the screen (you can always do a quick rect intersection test to determine if your control is visible.) If we assume that the hash is only called for things that are actually drawn, the cost doesn’t really matter, because it will be dwarfed by the cost of drawing the glyphs on the screen.

We can make the macro look a bit nicer too. We don’t really need the TM_STRING_ID__ prefix anymore since these are no longer global defines that need to be unique, but just ordinary strings, and we can bake in the call to hash() into the TM_LOCALIZE macro itself, leaving us with:

TM_LOCALIZE("FILE_MENU")

There is still something I don’t like about this though: every time we want to add a new string to the UI we have to come up with a new identifier name for it, and naming is one of the two hard things in Computer Science. It is easy enough for things like the File menu, but what about a string like "Number of rendered triangles: %d". You end up with something ridiculous like:

TM_LOCALIZE("NUMBER_OF_RENDERED_TRIANGLES_WITH_INTEGER_FORMATTER")

There is also a bad disconnect here. To know what the NUMBER_OF_RENDERED_TRIANGLES_WITH_INTEGER_FORMATTER actually resolves to, which is pretty important when you try to write your sprintf() call, you have to go and look in a completely different file — the file that contains the translation tables. It would be much better if I could just see the actual text of the string at the place where it is being used.

This leads us to the galaxy brain solution: instead of coming up with a suitable ID and hashing that to get a unique number, we just hash the string we’re translating itself and use that as the key. Thus, we end up with:

TM_LOCALIZE("File")
TM_LOCALIZE("Save")
sprintf(buf, TM_LOCALIZE("Number of rendered triangles: %d"), ntri);

Much better.

Note that if the string changes the key will change too, so the old translations no longer work. But that is a good thing. If the string has changed it probably needs a new translation. For example, if we changed to using a double for ntri and printing it with %g we would want to be sure that all translations made this change too, or we would run into trouble.

There is one problem with using the string itself as the key and that is homonyms — words with the same spelling but with different meanings. For example, File has at least eight different meanings and each of those meanings might have different translations. In fact, even words that have a single unambiguous meaning may have multiple translations. For example, the Swedish translation for the English word sock could be either socka (thick sock) or strumpa (thin sock).

We handle homonyms with an optional context parameter for the TM_LOCALIZE macro:

TM_LOCALIZE_WITH_CONTEXT("File", "menubar")

This specifies that we want the translation of File in the context of the menubar which disambiguates it from other potential uses of File. We use a hashed string to identify the context, to avoid the problems with numerical IDs listed above.

Context only needs to be used where there is a possibility that the same string will be used multiple times for different purposes. Things like "Number of rendered triangles: %d" do not need context.

You could perhaps argue that this approach is worse than using sequential IDs since there is a risk that we forget to add context where it’s needed. With IDs, you are more aware of reusing a string that has been used previously, whereas, with this approach, reuse can happen “accidentally”.

I don’t think this is a big drawback though, as we can always go in and add context later as needed to disambiguate.

Implementing the lookup

Let’s look at the implementation of the TM_LOCALIZE macro:

struct tm_localizer_i
{
    tm_localizer_o *inst;
    const char *(*localize)(tm_localizer_o *inst, const char *s,
        const char *context);
};

struct tm_localizer_api
{
    // Current default localizer.
    tm_localizer_i **def;
};

#define TM_LOCALIZE_WITH_CONTEXT(s, ctx) ((*tm_localizer_api->def)-> \
    localize((*tm_localizer_api->def)->inst, s, ctx))
#define TM_LOCALIZE(s) TM_LOCALIZE_WITH_CONTEXT(s, "")

This defines an abstract interface for a localizer tm_localizer_i as having a single function localize() that given a string s and a context context returns the translation of that string in the current interface language. Then it defines an API tm_localizer_api for getting and setting the current default localizer. Note that we just use a variable for this, rather than getter and setter functions to keep things simple.

Finally, the TM_LOCALIZE() macro calls on the current default localizer to localize the supplied string with an empty context.

We use this indirection through an abstract interface rather than a hard-coded localization function so that end users can implement completely custom localizers if they want to. For example, you could implement a Google Translate based localizer (probably not a good idea).

Our default localizer uses localizer tables defined in the C code:

{
    { .english = "File", .swedish = "Arkiv" },
    { .english = "Save", .swedish = "Spara" },
}

to create hashmaps from English words to their translations in different languages. Localizer tables can be defined in any DLL and registered in a central registry for use by the localizer. Thus, if you wanted to localize The Machinery to Xhosa or any other language that we don’t currently support, you could do so by adding a plugin DLL with translator tables.

Things that can go wrong and ways to fix them

There are two main ways that localization can fail:

  1. You have localized a string with TM_LOCALIZE() in the source code, but forgot to provide a translation for it in the translation table.

  2. You have a string in the UI that was supposed to be localized, but you forgot to wrap it with TM_LOCALIZE().

To handle the first problem, we provide a tool localize.exe that scans the source code for TM_LOCALIZE macros and also parses the localization tables. If it finds TM_LOCALIZE macros that don’t have any corresponding translation table entries or vice versa, table entries that are unused, it will print out error messages, so that the problem can be fixed.

Here is a sample run:

> bin/Debug/localize
plugins/editor_views/localizer_table.inl:
Unused:

        "Test Sample Content",

the_machinery/localizer_table.inl:

Missing:

        { .english = "Created: %s", .swedish = "" },
        { .english = "Deleted: %s", .swedish = "" },

Since localize parses the source code, it depends on TM_LOCALIZE being used with static strings only. I.e. you can’t dynamically compose a string and then localize it.

// This doesn't work, localize.exe can't figure out the strings that
// need to be localized.
s = TM_LOCALIZE(sprintf(b, "%s job!", score > 10 ? "Great" : "Good"));

// Instead do this
s = score > 10 ? TM_LOCALIZE("Great job!") : TM_LOCALIZE("Good job!");

In general, it is better to localize full sentences anyway, rather than to compose sentences out of individually localized pieces, because the former approach can better handle situations where word order differs between languages, etc.

To diagnose the second problem — when the programmer has forgotten to add the TM_LOCALIZE() macro — we support setting the user interface language to Gibberish.

When the language is set to Gibberish*,* our localize() implementation algorithmically transforms the input string into nonsense text. E.g. *File → Hgryp*, *Save → Xcdan*. If you set the interface language to Gibberish **and still see proper English text somewhere in the UI, you know that part of the UI is missing a TM_LOCALIZE() call.

The Gibberish translation serves another purpose too. We intentionally make the Gibberish strings ~40 % longer than their English counterparts to make sure the UI layout doesn’t break down if some strings end up longer after translation.

To type or not to type

The Gibberish approach to finding missing TM_LOCALIZE() macros is not perfect. It requires us to inspect the entire UI — all menus, dialogs, etc — to find any strings that are lacking translations.

There is a different approach you could take — to define a separate type for strings that have been localized:

struct tm_localized_string_t {
    char *s;
};

tm_localized_string_t localize(const char *s);

bool button(tm_localized_string_t text);

This way, the UI button() function would only accept localized strings and the localize() function which is called by TM_LOCALIZE() would turn ordinary strings into localized strings. A missed TM_LOCALIZE() call would result in a compile error.

The drawback of this approach is that it requires a little bit more type juggling. For example, say you want to line break a localized string text. You can no longer get the next word of the string with:

char *w = strchr(text, ' ') + 1;

Instead, you need to do:

tm_localized_string_t w = {.s = strchr(text.s, ' ') + 1};

This trade-off between type safety and type juggling is not unique to localized strings. I’ve found it shows up all the time in API design, forcing you to make a choice between a familiar generic type that is easier to understand and manipulate and a system specific one that provides better type safety.

For example, suppose you have a sound system that identifies each playing sound with an ID that can be used to stop or otherwise moderate the sound. Which is better:

// ID is a plain integer
uint64_t play(sound_resource_t *res);
void stop(uint64_t id);

// ID is a custom type
typedef struct playing_sound_id_t {
    uint64_t id;
} playing_sound_id_t;
playing_sound_id_t play(sound_resource_t *res);
void stop(playing_sound_id_t id);

Is the extra type safety of making sure that the user doesn’t pass any old uint64_t to the stop() function worth the complexity of adding another type?

I’m still a bit on the fence on this issue. Sometimes I think that having the extra types makes the interface clearer and plays an important role in preventing programmer errors. Sometimes I think it just complicates things and “hides” from the user what is actually going on. Instead of just dealing with the uint64_t that we know and love, the user now has to handle a plethora of mysterious types: playing_sound_id_t, playing_particle_effect_id_t, undo_scope_id_t, object_id_t, entity_id_t, etc. But I don’t know. What do you think?

by Niklas Gray