The Machinery Shader System (part 1)

Feb 12, 2018

Designing and implementing a solid shader system has been in the back of my head, itching for attention, since even before we started Our Machinery. I’ve successfully managed to ignore it all the way up until last week, when reality finally caught up with my procrastination, rubbing in the fact that I’ve reached the point where I need to do some more serious shader development, and that the current bare-bones system simply won’t cut it.

My definition of a “Shader System” is rather simple, it boils down to a few things:

Some kind of meta-language and compiler for authoring of shaders where the user can define various render states, write shader source snippets (in some flavor), and then piece together more or less complete pipeline state objects from that.
Some way to get data in the form of constants and resources (textures, buffers, samplers), in and out of the shader system as efficiently as possible.
Some kind of resource management system for loading, reloading and unloading the finished shaders.

To be honest, I kind of hate shader systems. It’s not that they are particularly hard to write, it’s more that whatever you do, chances are overwhelmingly big that you won’t feel too satisfied with the final result. Or at least, that is my experience with my previous four (rather ambitious) shader systems that I have designed, written and maintained. Not to mention all the hard words and hate I’ve gotten from various co-workers when they’ve had to fix bugs or extend any of these systems. But it’s all fine. I have seen the authoring workflows and implementations of many different shader systems over the years, and so far they all pretty much suck in one way or another. And while it is somewhat comforting to know that I’m not alone in causing frustration among co-workers and users, it is also a sad realization that it appears to be inevitable ending up with a system that is quite far from perfection.

But why are they so hard? I think there are a bunch of reasons for that, here are a few from the top of my head:

There’s a big chance that the final system will spread its tentacles into almost all other rendering systems, and if you don’t plan for this you are likely to introduce rather harsh coupling between the systems. This makes it very time consuming to refactor the shader system when that day comes.
The best way to handle resource and constant bindings tends to differ not only between graphics APIs and platforms, but also between the different underlying rendering systems. E.g. a system responsible for rendering a crowd of 20 000 skinned characters with carefully planned material variations is very different from a system for rendering a few full screen triangles as part of a post processing stack. Yet, they both need to talk to the shader system.
Deciding how to best handle generating shader variations (e.g skinning on/off, light maps on/off, etc.) is not only hard from a shader authoring point-of-view, but can also very quickly blow up your shader compile times to insanity if you aren’t careful. At the same time, you have to provide really fast incremental compile times during shader authoring.
Finding a good balance between how much of the code that deals with declaration of resources and constants that should live in the actual shader source snippets vs be declared and generated through some form of meta-data, is hard. The same also applies for shader stage-to-stage linkage information.
As with any system exposing an interface that allows users to author content on top of it, what you expose to the user through the front-end becomes very important. If you are too restrictive with exposing platform specific aspects it is easy to end up limiting the shader authors, moving them too far from the hardware. But on the other hand, if you expose too much, there’s a big chance you won’t be able to refactor the system simply because there’s too much data depending on it.

Then, on top of all this, is the fact that we still do not have a single obvious choice when it comes to which language to use when writing the actual shader source snippets.

But do we really need a “system” for this? If it is so hard to build a good generic shader system should we even offer one as part of The Machinery? To be honest this is a question I’ve been struggling with since we started this project, but I have reached the conclusion that the answer is yes.

Without providing a standardized way to author shaders there’s no quick way to build more advanced rendering systems. As soon as a rendering system needs to do anything more complex than running a few hardcoded shaders, it will very quickly end up needing some kind of authoring environment and management of shaders. And while it’s not unlikely that you can hand-tailor the shader management within a rendering system in a way that provides benefits on top of a more generic shader management system, the price for reinventing the wheel for every rendering system will simply become too time consuming and bug-prone. Not to mention the fact that sharing shaders between different systems would become more or less impossible, porting The Machinery to run on more platforms would become much harder, and keeping shader compile times under control will be, well… also hard.

Okay, so here we go… Fifth time around, maybe this time it will all pan out, and the most clean, amazing and beautifully architected shader system will come out of my fingers. Probably not, but I’m at least fairly sure that it will be a decent step up from the last system I built.

So in this, and probably the coming two posts, I will walk you through my ideas, design and certain implementation aspects of The Machinery’s “Shader System”.

Goals

Let’s begin by defining some basic design goals for the system:

The shader system offers two tiers,
- Tier 0 - Shaders: The user of the system reasons about Shaders. A Shader is as close to a complete pipeline state as possible (i.e. all shader stages linked together, coupled with necessary states). At this tier there are no concepts of multi-pass shaders, execution contexts, or frame scheduling.
- Tier 1 - Materials: At this tier the user reasons about Materials. A Material groups one or many shaders together and adds the missing concepts:
  - Multi-pass & frame scheduling: An array of shaders executing immediately after each other or scheduled into various rendering layers dictated by the render graph.
  - Execution contexts: Provides the possibility to select which shaders to run depending on an external “execution context”. A typical example when something like this is needed is when rendering a material into a shadow map vs rendering the same material into the regular viewport.
The shader system is developed as a stand alone plugin. In Tier 0 it only depends on the the render graph system.
As far as possible the shader system should only act as a convenience layer providing a simple yet powerful way for rendering plugins to interact with shaders. If desirable there should be nothing stopping the user from opting out of using the system or providing their own implementation.
While the first version of the system won’t offer a shader graph authoring front-end, the APIs are designed to hopefully make that easy to add.

Next, let’s take a look at some more technical design goals:

There should be a clear concept for grouping constants and resources together based on frequency of update (frame, view, object, material, etc).
The same resource or constant referenced from multiple shader stages should never get duplicated into multiple buffers as the cost of scattering the data easily becomes unpredictable. Indirectly this means that the system will be in charge of generating and declaring load/store helper functions (or macros) for all constants and resources and expose those to the shader source.
The API for updating constants and resources should be able to do efficient batch updates.
The underlying management of constants and resources should be hidden from the shader author. Striving for flexibility, being able to completely rearrange memory layouts of data in any way that makes sense on a low-level without it affecting the hand authored shader source.
The system should provide a good interface for iterating over shader source and hot-reloading the finished result.
As with the rest of the rendering code in The Machinery, the system should primarily be designed to run on top of new explicit graphics APIs (Vulkan, DX12). It will utilize “bindless” concepts and be structured in a way that should make it easy to move “scene submission” to run on the GPU using graphic API extensions such as Nvidia’s “Device Generated Commands” (VK_NVX_device_generated_commands).

Enough listing goals, let’s start low-level with the stuff that is already in place and then walk our way up to the finished system over the coming two posts, or however many it will take.

Low-level shader compiliation

The render resource representing a shader aims to provide enough data to the render backend to almost be able to create a full “pipeline state object” (I discussed the reasons for not taking it all the way in my post on “Vulkan: Pipelines and Render States”) and looks something like this:

enum tm_renderer_shader_stage {
    TM_RENDERER_SHADER_STAGE_VERTEX,
    TM_RENDERER_SHADER_STAGE_TESS_CONTROL,
    TM_RENDERER_SHADER_STAGE_TESS_EVAL,
    TM_RENDERER_SHADER_STAGE_GEOMETRY,
    TM_RENDERER_SHADER_STAGE_PIXEL,
    TM_RENDERER_SHADER_STAGE_COMPUTE,
    TM_RENDERER_SHADER_STAGE_MAX
};

typedef struct tm_renderer_shader_blob_t
{
    uint64_t size;
    uint8_t *data;
} tm_renderer_shader_blob_t;

typedef struct tm_renderer_shader_t
{
    tm_renderer_shader_blob_t raster_states;
    tm_renderer_shader_blob_t depth_stencil_states;
    tm_renderer_shader_blob_t blend_states;
    tm_renderer_shader_blob_t multi_sample_states;

    tm_renderer_shader_blob_t stages[TM_RENDERER_SHADER_STAGE_MAX];
} tm_renderer_shader_t;

tm_renderer_shader_compiler_api is the name of our low-level shader compiler API that is responsible for producing these tm_renderer_shader_blob_t structs that assembles the final shader. Each render backend has to implement this API and it has three responsibilities:

Act as a reflection API for enumerating all state blocks (such as raster states, depth stencil states, blend states, sampler states, etc) and their valid inputs.
Expose an interface for “compiling” state blocks into backend specific binary blobs.
Expose an interface for compiling shader source for a specific shader stage (vertex, hull, domain, geometry, pixel or compute) into a backend specific binary blob.

State blocks

The bulk of the API functions deal with reflection (or enumeration) of the state blocks. The idea is that while most states that go into setting up the render pipeline are very similar in all graphics APIs, we don’t want to lock us down to only supporting the least common denominator of states. So while the renderer plugin serves us with the descriptions of the most common state blocks and their contents, we still want to allow a backend to be able to expose both new state blocks as well as extend the existing state blocks with more states. This makes it easy to expose graphics API specific states to the shader author.

The API exposes functions for retrieving the number of supported state blocks, and then, for each state block, the user can retrieve a unique type ID and a human readable name, together with the number of supported states in the block.

Then, for each state, which essentially is a simple key-value pair, the user can query an id and human readable name of the key, together with type information for the value. If the type of the value is an enum, the user can use a similar mechanic to enumerate all existing values.

From this information the user can build a simple linear array with key-value pairs for all states in a state block, where each state looks something like this:

typedef struct tm_renderer_state_value_pair_t
{
    uint32_t state;
    union
    {
        uint32_t enum_value;
        uint32_t uint32_value;
        float float_value;
    };
} tm_renderer_state_value_pair_t;

The user can then generate the final tm_renderer_shader_blob_t by handing over an array of states to the compile_state_block() function in the tm_renderer_shader_compiler_api. Any states not defined in the array will end up with valid default values. As mentioned before, the contents of the resulting blob is backend specific, for the Vulkan backend it simply results in Vulkan specific structs describing each state block (e.g. VkPipelineRasterizationStateCreateInfo, VkPipelineDepthStencilStateCreateInfo, etc.)

Shaders

The second part of the API is a single function that handles compiling shader source into whatever representation makes sense for the backend. E.g. for Vulkan the resulting output is a SPIR-V blob. The function looks something like this:

tm_renderer_shader_blob_t (*compile_shader)(tm_renderer_shader_compiler_o *inst,
    const char *source, const char *entry_point, uint32_t source_language,
    uint32_t stage);

source is a string with the shader source. entry_point is the name of the program’s entry point, source_language is an enum informing the compiler what language the source has been written in (HLSL, GLSL, etc) and stage is what shader stage that the shader will be bound to.

Fairly simple and straight forward stuff. Before I wrap up I’d like to make a quick note about selecting a shader language when authoring our default shaders.

My current take is to use HLSL and generate SPIR-V from that. The main reason for using HLSL instead of GLSL as input language is that I want to make it simple to add a DX12-backend.

At the moment we are using libshaderc with a slightly modified version of glslang where I have hacked in some stuff (e.g. Readfirstlane) in the HLSL-front-end. It’s pretty messy but so far it appears to work. I’ve been planning to try out the new SPIR-V backend to the DirectX Shader Compiler project which looks promising, but so far haven’t gotten around to it. I wish there was a simple goto-solution when it comes to picking what shader language to use, but the reality is that there’s not. I anticipate being stuck with having to juggle shader source in lots of different languages. As soon as you peek outside the game development industry you realize that all bets are off when it comes to standardizing on a single language (just a few examples from the VFX-industry: OSL, MDL, Material-X).

Next time

I think that covers the low-level API for state and shader compilation in enough detail. The shader system plugin sits on top of the tm_renderer_shader_compiler_api feeding it with data to compile.

In my next post I will go through a concept that I currently call “shader system IO”, which will sit as an interface between various systems in need of doing rendering and the actual shaders. It creates a concept for grouping resources and constants by their update frequency, and is the main mechanics for feeding data defined in C/C++-code to shaders. But on top of that it also plays a central role in keeping systems nicely decoupled and generating variations of shaders.

Stay tuned.

Goals

Low-level shader compiliation

State blocks

Shaders

Next time

by Tobias Persson