One Draw Call UI
Tobias wrote a nice post about the low level rendering of our UI. If you haven’t checked it out already, go ahead and do so, it introduces some interesting concepts.
To follow up, I wanted to say a little bit about the more high-level part of the UI, since that’s what has been occupying my mind the last few weeks.
We’ve decided to go with an immediate mode rather than a retained mode model for the UI. For a primer on to these two different models, have a look at Casey Muratori’s introductory talk or this tutorial by Sol.
The main reason why we like immediate mode, or IMGUI as it is often called, is that it avoids a lot of state synchronization between the UI and the underlying “model”. This tends to reduce boiler plate and produce code that is “simpler” (in some sense of the word). In an IMGUI it is easy to follow a control from the initial line of code that draws it all the way to the actual graphics that gets produced. In retained mode, your actions touch a bit of state that gets processed later and it can be hard to see the consequences (did this change cause a reflow?). We believe that there is great value to the transparency you get with the IMGUI model.
The ability to draw some quick controls deep inside a routine without worrying about who “owns” them and who needs to clean them up is also really nice.
IMGUI Performance
The two main critiques that are usually brought up against IMGUIs are:
-
It is good for simple UIs, but once you want to do X it doesn’t work (where X can be a text editor, drag-and-drop, or something else).
-
It is inefficient to redraw the UI every frame.
The first issue can be quite easily disproved by just doing X in an IMGUI. There are plenty of examples of people doing X for lots of different values of X.
But I wanted to say a little bit more about the second point.
First, I think people underestimate just how mind-numbingly fast computers are at doing simple operations. And most of the things we want to do in a UI are pretty simple. Putting some rects in a buffer. We can do a lot of that before it starts to become a problem.
If it becomes a problem, there are a lot of things we can do to fix it. For example, optimizing a scrolled list to only process the items that are visible is trivial in an IMGUI. In contrast, in many retained UIs this requires switching to a completely different protocol, a Virtual List View or something similar.
Note that the usual implementation of a Virtual List View means putting an immediate model on top of the retained UI.
Caching can also be used in various ways. Expensive operations (if you have any) can be cached with hashed values of the input arguments. We can avoid updating the UI all together if there is no interaction with it. We can use separate UIs for different tabs, and only update the UIs of the tabs that the user interacts with, etc. We can also put a retained model on top of the immediate UI, keep expensive calculations in the retained model and only use the immediate UI for drawing. In fact, our docking system does just that, as you will see later.
The fact that it can be useful both to put an immediate model on top of a retained UI and a retained model on top of an immediate UI shows that both models have merit. But between these two approaches, I find the retained model on top of the immediate UI to be cleaner. Systems without state are simpler and its cleaner to have the simpler systems at the bottom of the stack.
Side note: I’ve seen people propose to cache bitmaps for unchanging parts of the UI, but in my mind it seems simpler and better to just cache vertex and index buffers. A 1024 x 1024 bitmap is 3 MB in R8G8B8. You have to draw a lot of UI before you get 3 MB of vertex buffers. And retina displays give vertex buffers an even bigger advantage.
So caching can be used, if you need it, but computers are so fast that I don’t really think that you do.
The important thing is that for me, performance is a process more than anything else. Pretty much anything can be made performant if you have the time to analyze where the performance problems are and then address those issues using various tricks and techniques. This is where IMGUI has a big advantage, in my opinion. In an IMGUI, the code flow is so straightforward that it is really easy to see where the performance issues are and how to fix them. In a retained system, with a more complicated stack, this is much harder. Retained UIs may or may not have a theoretical performance advantage, in practice I think IMGUIs have the upper hand.
Case in point: I’ve seen people struggle for weeks with making a performant log viewer in a retained system (HTML/Javascript). In an IMGUI, we just keep the strings in a buffer and draw as many as will fit on the screen. Less than an hour’s work, and the cost is just the cost of drawing the strings that fit on the screen, no matter how big the buffer is.
Single Draw Call
If you’ve been following the blog, by now you should know that one of the fun things we do at Our Machinery is to take a strong, elegant, but maybe a bit extremist idea about how to write code, and then push that idea as hard as we can to see if we can make it fly. (Previous hits are things such as writing all header files in C, not allow header files to include other header files, etc.)
The nice thing about this somewhat drastic development methodology is that it can pull you out of the rut of same-sameness, force your brain to think in new ways and move into some exciting, unexplored territory.
For the UI, one such idea that we went with is:
- Draw everything with a single draw call.
This is an interesting idea, because it seems almost possible — the interesting space between the trivial and the impossible. Bindless APIs such as Vulkan lets us do this without massive texture atlassing, but there are still some issues that we need to solve, such as:
- Won’t we need different shaders to draw textured and untextured primitives and other “special” things?
- Don’t we need different draw calls to handle different clipping/scissoring shapes?
- What about overlapping windows and drop down menus?
Tobias already addressed the first two items in his post, so I won’t say more about them here. I’ll talk more about overlapping windows below.
The nice thing about using a single draw call is that though it complicates things somewhat, in the sense that we have to find solutions to the problems above, it also vastly simplifies things.
Using a single draw call means that our drawing functions can just take a vertex buffer and an index buffer as input and write the drawn shapes directly into those buffers. The drawing functions don’t have to worry about creating multiple draw calls, routing those draw calls to different shaders, etc.
We also don’t have to worry about how to efficiently “batch” our drawing to keep the number of draw calls down (because the number of draw calls will always be exactly one). This is something you can really go crazy with and apply all kinds of advanced algorithms — dynamic texture atlassing, finding non-overlapping items that can be put in the same draw call, without worrying about draw order, etc. There is probably room enough for a couple of PhD theses there, and we don’t have to do any of it. Nice!
System Layers
Our UI implementation has three layers:
The Drawing layer just knows how to draw basic shapes such as rectangles, circles and text. The UI layer uses the drawing layer and user input to implement interactive controls, such as buttons and checkboxes. Finally, the docking layer puts those controls into dockable tabs that can be dragged and dropped between windows.
As stated above, the drawing functions just write data to vertex and index buffers. The API looks like this:
struct tm_draw2d_buffer_t
{
uint8_t *vbuffer;
uint32_t vbytes;
uint32_t *ibuffer;
uint32_t in;
};
struct tm_drawing_api {
void (*fill_rect)(struct tm_draw2d_buffer_t *buffer,
const struct tm_draw2d_style_t *style, struct tm_rect_t r);
...
};
The style
parameter contains settings for color, line width and other UI style options.
The drawing function will write data directly into the vbuffer
and the ibuffer
and increase the counters vbytes
and in
(number of indices) to reflect the written data. Note that there is no provision for automatically growing the buffers. It is the responsibility of the caller to make sure that the buffers have space enough for the data that gets written.
I like to separate the tasks of allocating memory and processing data as much as possible to make APIs more flexible. If we had put the memory allocation inside the fill_rect()
function we would dictate that the vbuffer
and ibuffer
must be allocated individually on the heap, they couldn’t be part of a larger structure. In addition, we would have to pass more parameters to the function, such as an allocation context, etc.
Keeping allocation and processing separate makes for a smaller, simpler API. I realize that this might be a controversial choice for people who are used to higher level APIs, but note that if you want a higher level API, you can just wrap the drawing functions in a library that also takes care of allocation. In fact, that’s exactly what the UI layer does.
We could just write regular triangle vertices to the vbuffer
, but as you can read in Tobias’ post, we try to be a bit more clever and compress the data that we write to use less memory. This compressed data then gets unpacked in the vertex shader.
Note that the drawing layer doesn’t have any internal state at all, it only writes to the buffers passed in to the functions.
The UI layer is a stateful API in the sense that you create an UI object and then you draw controls (in immediate mode) into that object. The UI object state holds the vbuffer
and ibuffer
that the UI is drawn into, input state, and information about the currently active control.
The control drawing functions look like you might expect from an IMGUI:
struct tm_ui_i {
struct tm_ui_o *inst;
bool (*button)(struct tm_ui_o *inst, struct tm_ui_style_t *style,
struct tm_ui_button_t *c);
...
};
This draws a button based on the style
settings using the drawing API. It also handles interaction with the control by detecting if the mouse button is pressed when over the object and returning true
to the caller in that case.
Input events are explicitly fed to the UI system with a function like this:
void (*feed_events)(struct tm_ui_o *inst, struct tm_input_event_t *events,
uint32_t count, struct tm_vec2_t offset, struct tm_vec2_t scale);
This feeds a number of tm_input_event_t
(our event format) to the UI for processing. The UI doesn’t care where these events come from. scale
and offset
are used to translate from input coordinates (for mouse position) to UI coordinates. (The UI may be offset from the origo and scaled — for example for a retina display.)
Our UI library provides a set of standard controls (menus, buttons, scrollbars, etc), but it also provides low level access to the draw buffer, input state and active control state. This can be used to extend the system with new controls. Since the draw functions are just free standing functions without state it is easy to add your own custom drawing functions too, for drawing shapes that we haven’t thought of. And since you also ultimately control the shader that the UI gets drawn with, you could even add completely new primitives with custom shader support. As long as the draw functions generate data that the shader understands, everything will work. This is kind of an extreme data-oriented approach, where things don’t have meaning except in how they are tied together.
The Docking layer knows about multiple UIs, living in separate system windows. (The docking layer doesn’t really care that they are “system windows”, they are just arbitrary rects as far as the docking layer is concerned.)
The docking system allows the user to create Tab Views and then supports dragging and dropping these tab views within a system window or between separate system windows. Tab wells are split automatically if you for example drop a tab on the right side of another tab. Note that the docking layer is in fact retained – it keeps a list of known tab views and windows, and items need to be explicitly added and removed from those lists. So this is an example of putting a retained layer on top of an underlying immediate UI.
When you want to render the UI, you call the docking system for each system window, and the docking system will return a list of the tabs that should be drawn in that window, together with the rect for each tab. The “chrome” for the tabs is drawn automatically by the docking system.
Handling Overlap
When we draw primitives, we write them in-order to the vertex and index buffers. We don’t use the depth buffer, so the items will be rendered in that order too. I.e., later items will be drawn on top of earlier items. This is usually what you want, but there are some situations where it can create problems:
-
Reorderable items. In a UI with graph nodes or free floating “windows” we want the user to be able to bring an item “to front” by clicking on it.
-
Popup windows and drop-down menus. A drop-down menu belongs conceptually with the control that triggers it, and we want to draw it at the same time as that control. However, if we do that, the controls below it that are drawn later would appear on top of the menu items.
Both of these issues could be solved by adding more draw calls, but we want to keep the purity of our idea of a single draw call.
Some IMGUI system have “windows” as a special concept — something conceptually distinct from other controls, but that doesn’t feel right to me. I want “windows” to be just as other controls. That way the user can easily add new “window”-like things, such as graph nodes, and there is no special performance cost to having thousands of “windows”.
For the first issue, we simply make it the job of the caller to keep track of the order of the windows. Clicking on a window()
control in the UI will return a BRING_TO_FRONT
event to the caller. It is the responsibility of the caller to handle this event and rearrange the window list so that the activated window ends up last and gets drawn on top of the others. The same principle can be used for graph nodes or any other type of control where the draw order can be manipulated by the user.
Popup menus are handled by keeping track of an overlay
drawing buffer. Popup controls that should appear on top of everything else are drawn into the overlay
buffer instead of the regular drawing buffer. Before we send everything down to the GPU we merge the regular buffer and the overlay buffer, so that we still have a single draw call. Merging is simple, we just append all the overlay data at the end of the regular buffer. The only kind of patch-up we need to do is to offset the indices in the overlay buffer with the size of the regular buffer.
You might ask if having one overlay buffer is enough. Won’t we need to have overlays for the overlays? I can think of situations that would require that — for example a popup-window could have a drop-down menu inside it. But I’m not sure that is a UI design style we want to encourage. Anyway, I leave that open for now, if we decide that we need it later, it is easy enough to extend the system with more layers.
Overlap is an issue not only for drawing, but also for interaction. Consider the case when two buttons overlap. If the user clicks, we want the click to only affect the top button and not the button underneath it.
Overlapping buttons might seem like a crazy corner case of bad layouting that we won’t have to worry about in practice, but it can actually happen quite easily. For example, if the UI has “draggable” items (graph nodes or “windows”), the user could drag one item on top of another, so that the buttons in the two items overlap. Another situation is when the user activates a dropdown menu. The items in the dropdown will overlap the buttons below. Remember, windows and popups are not special cases in our system, so this is just as much a problem as if you put two buttons directly on top of each other.
The problem with overlapping controls in an IMGUI is that while we are processing one control, there is no way for us to know if other controls that are drawn later will overlap it. We have no retained information and we don’t know what the system might do after drawing this control. Thus, there is no way for us to tell if we should process a click on the control or not.
To handle this situation we need to introduce a frame delay, so that later controls have a chance to “override” our decision to react to an event, if those controls are drawn on top of us.
The way we do this in practice is like this: When a control is drawn we check if the mouse is on top of that control. If it is, we will set a next_hover
state variable to the ID of the control. At the end of the frame we set the hover
variable (which tracks which control the mouse is over) to the value of next_hover
.
If a later control detects that the mouse is on top of it, it will change the value of next_hover
. Thus, it is the last control that the mouse is over that controls the hover
state. Note that this matches how controls are drawn, because it is also the last control drawn that appears on top. So this will ensure that hover
matches what the user expects.
When there is a mouse click, only the control that matches the hover
ID will process the click. Thus, clicks automatically get processed by the right (topmost) overlapping control, and only by that control.
To handle overlays (a dropdown menu appearing on top of other controls) we also set a next_hover_layer
variable. A control can only overwrite the next_hover
variable if its layer is equal to or greater than the next_hover_layer
parameter. So if an item in the overlay buffer has set next_hover
, no item drawn to the regular buffer will be able to change it. This gives the right interaction for items in overlays.
Preview
Putting it all together, here’s a preview of our UI system showing some sample controls and the docking system. Note that the visuals are still a work-in-progress and haven’t gone through a polishing pass: