I could easily spend an entire post just writing about my love-hate-relationship with Vulkan. Over the last six months working with Vulkan it has brought me plenty of joy but also so much frustration and annoyance that I wouldn’t even know where to begin. And, to be honest, it doesn’t feel very productive. I think most people who have done some serious work with Vulkan know that it is “a bit” rough around the edges. But we have still decided to live with that as we believe Vulkan is moving in the right direction, both for being the only explicit cross-platform graphics API but also in terms of openness, performance and delivering quick access to new hardware features.
So instead of nagging about stuff I don’t like or understand about the API I will talk about my current approach to implementing various performance critical parts of our Vulkan backend. More precisely, in this and the coming one (or two) blog posts I will walk you through how we currently manage Descriptor Sets, Command Buffers and Pipelines*.*
Before we get going I want to underline that I’m really not an expert in Vulkan and that I might very well be doing things inefficiently/non-optimal.
If you’ve read my previous blog posts about the rendering architecture in The Machinery you might remember that we are building a form of graphics API agnostic command buffers in the renderer that later gets translated by a render backend into the final graphics API specific command buffers. This translation step goes wide using our job system, and while we can make a lot of preparations to make this a fairly simple task there are still some challenges to avoid having to deal with mutation of shared resources within the jobs.
Management of Descriptor Sets
Let’s start with the area where I feel the least comfortable that I’m doing the right thing — management of Descriptor Sets.
In Vulkan, resources (such as buffers, textures, samplers) are exposed to shaders through Descriptor Sets. Sets are allocated from Descriptor Pools by handing it a Descriptor Set Layout. The layout is created from an array of structs describing each resource binding in the set.
I’ve previously talked about how we map the renderer’s concept of a resource binder to a Descriptor Set. Not much has changed since that design except that I’ve ditched the support for updating the resource binder mid-frame (the binder type I referred to as
DYNAMIC in the post). I decided to do so as I can’t see any use cases where it is really needed and since it complicates state tracking when going wide in the backends.
The resource binder is created like any other backend resource (i.e. buffers, textures, samplers, shaders, etc) using the
tm_renderer_resource_command_buffer_api (see: “A Modern Rendering Architecture”). On creation the user gets a handle to the resource binder that can be used to update what resource(s) each bind point is pointing to. In the resource manager of the Vulkan backend, each resource binder maps to a
vkDescriptorSet and any updates are written directly into the descriptor set using
That’s all fine and straight forward.. but now, let’s see what we have to do to be able to actually bind one or many
vkDescriptorSets to a command buffer using
vkCmdBindDescriptorSets(). As soon as the descriptor set has been bound to a command buffer that has been submitted using
vkQueueSubmit() we must guarantee to not update or destroy it until the command buffer has been fully consumed by the GPU.
This poses a problem as its not unlikely that the user wants to update some bindings in one or many descriptor sets that currently is in flight, i.e. bound to a submitted command buffer not yet executed.
To deal with that we could introduce some kind of versioning concept of the descriptor sets owned by the resource manager. That would mean that every time we update a descriptor set we would either have to make a copy of the current descriptor set (as that might be in flight in a command buffer) and mark the old one as garbage for some kind of garbage collection taking place when we know the descriptor set is safe to delete. Or alternatively, take it one step further and somehow flag the descriptor set if it is currently in flight and only make the copy if it is. Both these options kind of suck because:
- Knowing when to garbage collect the old set is not trivial and involves state tracking, we hate state tracking.
- Garbage collecting the descriptor sets will lead to fragmentation in the descriptor pools.
- Keeping track of if a descriptor set is in flight literally means state tracking, and yes, we still hate state tracking.
The reason we hate state tracking so much has to do with the complexity it introduces when going wide. We would need to expose some thread safe and efficient way for mutating the state of the shared object wrapping the descriptor set. While that is definitely solvable it quickly becomes unnecessarily complex.
The bigger problem I foresee with the above approach is fragmentation in the descriptor pools. Without introducing even more state tracking, there’s no way to tell if a descriptor pool has any sets that are still alive so we can’t just reset the entire pool, we would have to rely on returning one set at a time to the pool as part of the garbage collection step. Doing so will over time fragment the descriptor pools, waste memory and slow down the system.
So instead I’ve turned this problem up-side-down. The descriptor sets in the shared resource manager are only “blueprints” and never get bound to any command buffer. Instead each job has its own descriptor pool that it allocates new descriptor sets from. It then copies the contents of the set from the shared “blueprint” descriptor set.
Each job has a pre-allocated array of
VkDescriptorSet with the same size as the shared array holding the “blueprint” descriptor sets. Copies are done lazy if we detect that the job hasn’t made a copy of the set already. As far as I know, there doesn’t seem to exist a simple way to copy the contents of an existing descriptor set into a another one, I’m currently building n-number of
VkCopyDescriptorSet structs (where n is the number of bind points in the set) and run
vkUpdateDescriptorSets() to create the copy (I recently stumbled across the extension
VK_KHR_descriptor_update_template which looks like it might be a better approach).
An alternative approach would be to do a copy of all “blueprint” descriptor sets up front, before we launch the jobs. But my current take is that it is probably better to pay the price of doing the copies within each job instead and only copy the sets that actually ends up being bound by the job. I imagine the number of “blueprint” descriptor sets potentially becoming quite large and it feels reasonable that only a subset of those will be actually bound each frame when we start rendering large 3D scenes.
The descriptor pool for each job is associated with the command buffers and gets reset and returned to an array of descriptor pools (the pool-pool as I call it in my head when I want to confuse myself) once the buffers have been fully consumed by the GPU. I will talk more about this and command buffer management in general in my next post.
So far this approach for managing descriptor sets appears to work pretty good for us, but I’m sure it will be iterated over many times as we start rendering more complex scenes.
I’m also pretty sure there are people out there with more knowledge of the Vulkan API that cringed a bit when they read this and have suggestions for better approaches. I encourage you to speak up, I’ve tried to find documentation/blog posts best practices/recommendations on how to most efficiently manage descriptor sets, but so far I’ve haven’t found much information on this topic.