Referencing Objects: Names vs GUIDs
One question that I keep coming back to again and again is whether references between objects are best represented as names or GUIDs.
Here is the situation: You have created some sort of data model for representing objects in memory/on disk. Now you need the ability for objects to refer to other objects. I.e., an object needs to talk about another object. Some examples:
- A material object may point to a texture object and say “I want to use this as my diffuse map”.
- An animation object may point to a model object and say “I want to rotate this model around its z-axis”.
How can we accomplish this?
Here are two options:
-
Names: Each object is referred to by its name. The name is a string assigned to the object by the user and the user can change this string at will (rename the object).
-
GUIDs: Each object is referred to by a globally unique identifier (GUID). The GUID is assigned to the object on creation and never changes. It is guaranteed to only represent this particular object and no other.
Names are resolved in some kind of context (typically the children of the
current object). Thus, to refer to an object that is “far away” from us we might
have to use a sequence of names to navigate the object tree, e.g.,
../../player/head/left_eye
. Much like a path in a file system, this sequence
of names provides a path from one object in our object tree to another. Note
that in this post I will sometimes somewhat sloppily talk about the name of an
object when I actually mean the full path to an object.
You might protest that there are other ways of representing references too. For example, an in-memory representation could just use a pointer. A disk representation could use a file offset. Combinations are possible too — for example (filename + offset) to represent an object inside a file. However, it is easy to become confused when considering the myriad of possibilities, so let’s put all of that aside for the moment. In this post, I’m going to focus on the difference between names and GUIDs and in the end we will see how the discussion applies to the other possibilities.
Side note: There is another interesting option apart from names and GUIDs and that is to refer to an object by the hash of its content. With this approach, the same content is always referred to by the same unique identifier (its hash) and if you change the content all the references have to be updated. If you start to think about it, most of git falls out as the result of this single design decision.
Names and GUIDs both have their pros and cons, making it hard to say that one is strictly better than the other:
Names | IDs |
---|---|
Fragile — if objects are renamed, moved or deleted, references will break | Unreadable — references look like random numbers which makes them hard to debug |
Cumbersome — coming up with meaningful names for everything is a chore | |
Expensive — names have to be matched against the object tree to find the objects |
Each of these points can be argued back-and-forth endlessly. Can’t we
auto-assign names to make them easier to come up with? But how readable are
names really if most of the things are just named box_723
? Can’t we make a
tool that looks up a readable name from a GUID? Can’t we also make a tool that
automatically patches references when an object is renamed? Etc, etc, etc.
Again, it’s easy to get stuck in the nitty-gritty details of this and miss the bigger picture. To make things clearer, let’s take a step back and ask ourselves:
What is the fundamental difference between names and GUIDs?
Think about it for a bit. Here’s my answer:
A GUID specifies an object identity, but a name specifies an object’s role.
The GUID 90e2294e-9daf-45f0-b75b-01fb85bb6dc8
always refers to one specific
object — the one single object in the universe with that GUID. The path
head/left_eye
refers to whatever object is currently acting as the character’s
left eye. It does not always have to be the same object. Maybe the character
loses her eye at some point and it gets replaced with a glass eye. Maybe we can
spawn multiple instances of the character in different configurations with
different kinds of eyes — flesh eyes, robot eyes, anime eyes, etc. Regardless of
the setup, head/left_eye
will refer to the character’s left eye.
In contrast, if we used a GUID to refer to the left eye and the eye got replaced, the GUID would still refer to the old eye we lost. And a single GUID couldn’t be used to refer to different eyes in different character setups.
Name | GUID | Hash | |
---|---|---|---|
References objects by | Their role | Their identity | Their content |
The pointers and offsets that I talked about in the beginning of the post are similar to GUIDs, since they reference objects by identity. A pointer always points to the same object. In fact, you could see a pointer as a deserialized version of a GUID — a way of uniquely referencing an object in memory. Offsets too, uniquely identify objects. (But offsets are not permanent, so references must be updated each time a file is saved.)
A name allows for “late binding” of references.
To get from a name to an actual object, we need to resolve the name at some point. This involves matching the path against the object tree and finding the corresponding object. In contrast to a GUID, which always points to the same object, a name might resolve to different things at different points in time, or in different contexts. The reference isn’t bound to a particular target until the name is resolved.
When does this happen? You can decide that when you design a system based on your performance/flexibility requirements. For example, you can decide to resolve all references once and only once — when the object is spawned. This is faster, because you only need to look references up once, but it also means that in the case where the eye is removed and replaced, the reference won’t be updated to point to the new eye. So it’s less flexible.
The other option is to resolve the reference every single frame. This can handle objects being removed and/or replaced, but it also means having to pay the performance cost of resolving the reference every single frame.
With this new understanding of the fundamental difference between names and GUIDs we can take another look at the pros and cons we listed above and see if we can understand them better.
Names are fragile — they can break if objects are moved, renamed or deleted
Yes, this is the whole point!
The main reason for using names is to allow late binding. Late binding means we don’t know beforehand what the name will resolve to (or if it will resolve to anything at all). We can’t get the benefits of late binding without also getting the drawbacks.
For instance, in the example above, after the eye has been removed, but before
it has been replaced with a glass eye, head/left_eye
will not resolve to
anything — because the character doesn’t have a left eye. Code that expects to
find an object at head/left_eye
might break.
A name might also resolve to something unexpected. For example, the eye might be removed and replaced by a little man. Code that was written to deal with an eye, or even with no eye, might break when it finds a little man in the eye socket.
In addition to breaking in this correct way — where a resolve rightly fails
because the object doesn’t exist — references can also break in incorrect ways.
The resolve might fail, not because there is no eye, but because the user made a
mistake. For example, maybe the eye was named LeftEye
instead of left_eye
.
To an extent — problems like this can be mitigated by good tooling. For example,
the tools might warn about unresolved references. The tools might also assist
with renaming, so that if you rename left_eye
→ LeftEye
all the references
are updated to LeftEye
too.
But note that there is an inherent conflict here. The whole point of using names
is to allow the references to be more lax and flexible. If the tools are too
anal with their warnings it kind of defeats that purpose. For example, it might
be totally correct that head/halo
doesn’t refer to anything, because the
character starts out without a halo — she only gets that once she’s completed
the Holy Mission. If a tool spews out false positive warnings about things like
this, users will soon learn to ignore them and miss the actually valuable
warnings about real typos.
Similarly, tools can’t be too aggressive about updating references when objects
are renamed either. Suppose that you designed a really cool robotic left eye for
the character. Then you decide that it would look better as the right eye, so
you move it into the right eye socket and rename it from left_eye
to
right_eye
. If the references are auto-patched, all references to the left eye
will now be changed to the right eye, which probably isn’t correct. For example,
if the left_eyebrow
had a reference to its eye, and that reference was
auto-patched, the left_eyebrow
would now think it sits over the right_eye
.
On the other hand, some references could have meant “the robotic eye” rather
than “the eye in the left socket” when they talked about left_eye
and those
references should get patched. Pretty messy and hard to make a nice UI for,
although I’ve tried
before.
Names are cumbersome — coming up with meaningful names for everything is a chore
As discussed above, a name isn’t just a string of characters, it is a
description of a role, of a relationship. head/left_eye
means the left eye
object in the head of the character. If you gave it a nonsensical name like
bob
or an auto-generated name like Object_13
it wouldn’t say anything about
the role.
To take advantage of the late binding feature of names you want to use
meaningful names that match the concepts that you have in your game. I.e. if
your characters can put on different helmets and backpacks you probably need
helmet
and backpack
names. If helmets and backpacks are just visual features
of some character models, can’t be removed or swapped out and don’t have any
gameplay purpose, they might not need their own names, they might just be part
of the head
and body
.
You can think of this naming as sort of a “logical rigging” of the model.
So yes, if you want to take advantage of late binding, you do have to spend some
time coming up with meaningful names and hierarchies. If all your objects are
just named entity_2713
you are basically just using names as IDs. This has all
the drawbacks of names (fragility, costly resolution) as well as all the
drawbacks of IDs (unreadability). Don’t do that.
Names are expensive — they have to be resolved
Again, late resolve is the point of using names, and it will always have a cost. You can’t get the benefit of late resolve without paying the cost for it.
Of course, it can be more or less costly, depending on how you implement it. My most important performance tip is: be clear about the scope in which names are resolved.
I like to use fully qualified paths. I.e., referring to a character’s left eye
would be head/left_eye
. Referring to the left eye from the right eye would be
../left_eye
. Here, ..
goes up to the head and then left_eye
descends to
the right eye.
It can be tempting to fall into the trap of convenience and say that we should
be able to just use the name left_eye
to refer to the left eye instead of a
full path, but it has scary performance implications. Instead of just searching
our children for a name match, we now have to search all our descendants
recursively. And if we want this to work from the right eye too, we not only
have to search all our descendants, we have to search all our parent’s
descendants too. Before you know it, you have to search the entire world for
this left_eye
. And even if we find it, how do we know it is the “right one” —
the one the user meant? Maybe our helmet has a little statue on it, and maybe
that statue has a left_eye
too? How do we make sure we don’t find that one?
Messy.
My preferred implementation for resolving paths is to first hash each part of the path (this can be done offline), and then at each step, we match the hash at that step against the hashed names of the current object’s children — either through a lookup table or directly. Maintaining a lookup table is probably only worth it once you start to have hundreds of children to match against.
Even though this avoids really expensive stuff like searching the entire object tree or doing string comparison, it is still a lot more expensive than just following a pointer (which an ID deserializes into).
Names vs GUIDs — The Smackdown
With this deepened understanding — who wins, names or GUIDs?
As discussed above, names have many disadvantages — they’re fragile, cumbersome and costly. But they have two main advantages:
-
They express intent. When I refer to
head/left_eye
it is clear to the reader what I want to refer to. Thus names have a meaning that pointers/GUIDs don’t have. Recording this meaning can be helpful. When we complain that identifiers are unreadable, it is the lack of meaning we talk about — not just the fact that the identifier is a jumble of hex characters. But meaning requires explicit intent. If your object is namedentity_23415
— there is no meaning in the name, it might just as well be calledcc1b9a7b-a5bb-4355-8cf1-f78b74fe2774
. -
They allow for “late binding” of the reference to an actual object. This allows new objects to “take the place”/”fill the role” of the originally referred object. Using this, we can “patch” objects in lots of interesting way. For example, we can take a character, replace its eye with something else and all references to the eye will still work. To do this with GUIDs we need to patch up all references too, so that they point to the “new eye”.
So which is best? It comes down to a judgment call.
My take is this — we’re trying to create a high-performance user-friendly game engine. Thus, we only want to pay the costs of using names (bad performance & fragility) in the cases where we really take advantage of their strengths (intent & late binding). In my experience — most of the times, we don’t need these features. For example, when you are placing a bunch of trees in a level you don’t really care about naming them and you don’t have any need for something else “assuming” the role of one of those trees.
For this reason, The Machinery uses GUIDs as the default way to represent references. When a model refers to a texture, it does so with a GUID. You can move or rename the texture and the model will still keep using the same texture until you explicitly point it to a different one.
But in addition to this, we also explicitly allow for name references in some systems — systems that we think benefit from the extra flexibility:
-
Our visual scripting system has a “lookup entity by name” node, allowing entities to be referred to semantically (such as
head/left_eye
) from within the scripts. -
Our animation system is still under construction, but we plan to have a similar feature there, allowing animations to target loose and flexible things such as
helmet/headlight/color
.
Having two different “kinds” of references like this is in the engine by no means ideal. Whenever we are designing a system we have to ask ourselves: does this reference need the flexibility offered by names or is using a GUID OK? We are also possibly missing out on some flexibility — in the cases where we’ve decided to use a GUID, it is much more cumbersome for the user to achieve the kind of dynamic retargeting that names make easy.
Still, it seems like the best compromise to me — we get the performance and stability of GUIDs/pointers in the majority of the code, but can still use the flexibility of names in the situations where we think it’s needed.
See also
-
Pixar’s USD format uses names for everything. From their viewpoint, having to occasionally fix broken references is worth the extra flexibility they get from using names everywhere. Of course, since they’re not primarily targeting real-time rendering, their performance requirements are different.
-
I’ve written about this topic before, if you want to see how my viewpoint has shifted over the years. Note though that the focus of that article is a little bit different. In that article I’m talking about referencing assets/resources on disk, so when I mention a path in that article I mean a disk path. Whereas, in this post, I’m talking more about references in general and I’m not that concerned with exactly how things get serialized to disk.