Vulkan: Pipelines and Render States

This will be the last post in my mini-series of posts covering some of the performance critical aspects of our Vulkan render backend. In my previous two posts I covered management of Descriptor Sets and Command Buffers. In today’s post I will describe how we deal with the creation and look up of VkPipelines, and also touch on our system for dynamically changing render states.

I will be using the DX9-terminology “render states” when referring to any pipeline states controlling fixed function hardware, such as: rasterization, blending, depth and stencil operations, input assembler, etc.

Pipeline Management

A VkPipeline is Vulkan’s equivalent to D3D12’s Pipeline State Object (PSO), basically a monolithic object describing the entire graphics (or compute) pipeline, that can be bound using single command — vkCmdBindPipeline(). A VkPipeline for graphics work is constructed from a bunch of render states, active shader stages and the output render pass (specifying the format of the render targets the pixel/fragment shader will be writing to).

If your data is static and well-known you can get away with pre-creating all needed pipelines before submitting any draw calls or compute dispatches. In my experience though it can be hard, and rather inconvenient to take it that far. While typically most render states can be determined already at shader authoring time it’s still fairly common that you won’t know the full state of the pipeline (e.g. what render pass you are targeting) until right before submitting the draw call.

So in The Machinery, I’ve made the decision that we still need to support “last minute” pipeline creation for convenience, but actively striving to keep the frequency of dynamic state changes to a minimum.

This is currently implemented by letting each device own a pipeline lookup. A hash map where the value is the VkPipeline and the key is the murmur hash of:

The currently bound VkRenderPass — i.e. the render target setup (null for compute pipelines).
The shader handle — identifying an object containing all of the active shader stages and default render states.
An (optional, and null for compute pipelines) short array of handles identifying render state override blocks associated with the draw call (more on this later).

For each draw call we generate the hash and check if it’s identical to the hash for the already bound pipeline. If so, everything is already setup for us and we can move on. If they differ, we try to lookup the pipeline in the pipeline lookup. If found, the pipeline has already been created and we simply bind it and move on. If not, we need to create it, bind it, and insert it into the lookup.

This is where things becomes a bit tricky. Depending on workload we might currently be running in a worker thread, building command lists in parallel with other worker threads. Since all worker threads are peeking at the pipeline lookup, we rely on it being immutable. So we can’t just insert the newly created pipeline into the lookup.

To handle this we let each worker thread carry an array of worker-thread-created pipelines. Since we expect the number of worker-thread-created pipelines per frame to be very low (typically zero), there’s no point in using a hash map for the lookup. Instead we simply iterate over the array to see if the requested pipeline already has been created by the worker thread.

So the final, worker-thread-friendly, algorithm boils down to something like this:

key = mumur_hash(pass, shader_handle, state_override_blocks)
if (key == most_recently_bound_pipeline_key)
    return

pipeline = global_pipeline_lookup.get(key)
if (!pipeline)
    pipeline = worker_thread_created_pipelines.find(key)
if (!pipeline) {
    pipeline = create_pipeline()
    worker_thread_created_pipelines.push({key, pipeline})
}

bind(pipeline)

When all worker threads are done, we merge the arrays of worker-thread-created pipelines into the pipeline lookup owned by the device.

This way we can avoid having to introduce some kind of locking mechanism when accessing the pipeline lookup. In general I’ve found this approach of sharing a global data structure and using worker thread local memory to queue up any changes to the global data structure, being something that scales nicely to lots of worker threads. It is also super trivial to implement, makes for code that is easy to follow, and works without having to resort to use locks or atomics.

Render State Override Blocks

While I said that most of the render states tends to be static and typically known already at shader authoring time, it can still be needed to support changing some of them dynamically. A few examples:

Flipping back face culling between clockwise and counter-clockwise, to correctly support mirrored object transforms.
Flipping between rendering polygons as filled and lines, to support wireframe views in an editor.
Dynamically changing blend states.
Adjusting depth biasing when rendering shadow maps depending on resolution, format, projection, etc.

In The Machinery, the shader author is responsible for providing well-defined default values for all render states. On top of that we allow the user to create Render State Override Blocks, a minimalistic render backend resource that only contains the states that the user wants to override from their default values. These blocks can be stacked on top of each other, similar to photoshop layers, to provide a form of override mechanism. This stacking is specified as a (typically very short) array of resource handles identifying the override blocks attached to every draw call.

As mentioned earlier, this array of resource handles is factored into the pipeline lookup key. So when creating a new VkPipeline, we first grab the default values for all render states from the shader, we then overwrite the values of any states referenced by the override blocks. If the same state is touched by more than one override block, the last block in the array touching the state takes precedence.

In Vulkan there’s a mechanism that allows specifying some render states as dynamic, and supply their values after the pipeline has been created and bound. These are render states that have a tendency to need more frequent changing than others, such as scissor rectangles, viewports, depth biasing, and others. By separating the state override blocks that only reference states that support this type of dynamic updating, we can handle all state overrides using the same system without having to make some special code path for those.

So far this feels like a decent compromise between performance and flexibility when it comes to management of pipelines and render states.

This was my last post for 2017 and I’d like to take the opportunity to thank all of you for taking the time to read my posts, and for all encouragement and valuable feedback that you’ve provided.

Merry Christmas & Happy New Year! You are awesome! ❤️