Efficient binding of shader resources
Since my last post I’ve discovered a couple of more resource types that I feel it makes sense to expose through the tm_renderer_resource_command_buffer_i
interface. Today we will take a closer look at one of them, which we call a resource binder. It can best be described as a graphics API agnostic object holding references to a number of resources (buffers, textures, etc) that should be bound to a shader.
In previous projects I’ve always relied on using reflection data from the compiled shaders to determine what kind of resources and constants that it needs as input/output, and then try to have the runtime slaving after that as efficiently as possible. I’m generally a big fan of letting data drive the runtime and it wasn’t until rather recently I started thinking about alternative approaches. As I see it there are primarily two problems with using the reflection based approach which led me into exploring alternatives:
-
Generally, shader reflection APIs tend to be kind of crappy. More than once I’ve stumbled across reflection APIs that lack even basic functionality — like getting the number of elements in an array. Stuff like that usually leads to very ugly hacks.
-
It tends to complicate experimenting with drastically different strategies for binding resources across various platforms and projects. E.g. on one platform you might want to shove all your draw call constants into a large raw buffer, while on another platform you want to stick with a more traditional constant buffer approach. To support differences like that without going completely insane when authoring the shader code, you’ll end up with creating some kind of meta-language that generates the declaration of resources and constant buffers automatically. At that point you already posses all the binding information you need, which just makes the round-trip via the reflection APIs unnecessary overhead.
So in The Machinery I’ve decided to test a different approach.
Let’s take a look at what we are trying to achieve by listing a few prerequisites:
- We want to move away from relying on shader reflection APIs. Data can still drive (through some kind of meta-language in the shader authoring system) but runtime/data-compiler should be in control. To make this work, all resources needs to get explicit register allocations in the shader code.
- We want to make it easy to experiment with different ways of feeding various graphics APIs without putting too much burden on the shader author. I.e., our shader authoring system will need to generate generic helper functions for retrieving data by name so that the shader author does not have to worry about where the data comes from.
- We want our resource binding model to map closely to Vulkan’s Descriptor Sets and D3D12’s Descriptor Tables and Descriptor Heaps.
- We want to encourage users to actively think about grouping resources by update frequency.
- We want to keep traffic on our
tm_renderer_command_buffer_i
related to resource bindings low.
So far the shader authoring system in The Machinery is almost non-existent so I won’t be covering how I envision the generation of resource declarations/access-functions in the shader code to work. Instead we will pretend as if we had that part in place and focus on how to efficiently deal with binding of resources on the runtime side.
For that we have an object that we call a resource binder. As I mentioned in the introduction, a resource binder basically groups a number of resources that one or many shaders want as input or output. We currently have two different types of resource binders: UPDATABLE
and DYNAMIC
:
UPDATABLE
— Updatable resource binders are updated using thetm_renderer_resource_command_buffer_i
and are considered being immutable while the backend consumes any of the followingtm_renderer_command_buffer_i
that the user submits. This makes them suitable for storing any kind of resource bindings that won’t change in the middle of a frame. A typical usage scenario could be to have four differentUPDATABLE
resource binders assigned to a draw call:- Frame Global: time, light sources, look up tables and other stuff that typically only change at most once per frame.
- View Global: camera transforms and other view dependent data
- Material data: any resources related to the material (texture bindings, constant buffer with material colors, etc)
- Object data: object transforms and other object dependent data
DYNAMIC
— Dynamic resource binders are updated together with draw calls / compute dispatch commands on thetm_renderer_command_buffer_i
. This means they can mutate any number of times during a frame. A typical usage scenario could be to manage resource bindings for a draw call belonging to a particle system where the bindings might change depending on view. (Disclaimer: I’m still not 100% sure if it makes sense to supportDYNAMIC
resource binders, or if it just complicates things. It feels like it should be possible to handle all scenarios by simply flipping through multipleUPDATABLE
resource binders instead.)
For DYNAMIC
resource binders we also serialize the state (i.e., what resource handles are bound to what registers) together with the draw-/dispatch-command . We do this as we strive for keeping command data on the tm_renderer_command_buffer_i
as self-contained as possible. This becomes very important as soon as we start doing parallel translation of our API agnostic tm_renderer_command_buffer_i
into actual graphics API command buffers. Without it we would have to backtrack among potentially tens of thousands of commands to puzzle together the exact state of the resource binder at a particular command from a bunch of smaller deltas.
So the basic idea is that we typically only need to pass the handles (where each handle is represented as a uint32_t
) of the resource binders that we want to bind to the shader for a particular draw-/dispatch command on the tm_renderer_command_buffer_i
. By decoupling the contents of the UPDATABLE
resource binders from the draw-/dispatch commands we significantly reduce memory footprint for those command which tends to represent the bulk of all commands on a tm_renderer_command_buffer_i
.
I also believe this decoupling will encourage users to think more carefully about their bindings and naturally start grouping them based on update frequency.
I won’t go into how this is implemented in our Vulkan backend as parts of the implementation is still a bit in flux, but if you are familiar with Vulkan you’ve probably already spotted that our resource binder maps 1:1 to a Vulkan Descriptor Set. But even if we later need to run on more old-school graphics APIs I think it makes sense to do this type of decoupling, it would be trivial to implement mapping from our resource binders to a slot-based shader binding model.
That’s it for today’s post. We’ve just reached a point where we have the most fundamental building blocks in place to put together a more serious test application. I’m super excited to take the new rendering architecture for a proper test run. 😃