Multithreaded rendering (Part 3)

After "little" break lets return to topic. Last time we ended on system that was already split on two threads. This week we will focus how to change it so it work the way we wanted:

So let's start from analyzing a little more situation on which we ended  in last part :
This model look already pretty good and would be enough in some situation. We can use it in this way :

Place marked as "Sync" mean time in which "Gather Data" wait till rendering finish using rApi render list (buffer contain all information's need to render frame). As we see game can update in meantime of rendering. So everything should work faster than in one thread.

Of course there are some examples which show that this situation is not always so nice as shown in earlier pictures i.e we can have situation like this:

Rendering task is not finished (some more complicated scene is rendered) but game could already prepare data for next frame. My resolve to this problem is double buffering of rApi render list. Thanks to that everything is more smooth and may look like that:

If there is good implementation it's easy to extend this idea on triple buffered version. But as always we need to remember more buffers is not always good things:

  • This is always compromise between size and speed. On PC memory is not so big problem so we don't need to worry so much about it but in i.e Mobile Devices it not so nice in this terms.
  • Sometimes increasing of buffers don't change anything so it's worth to think few times if this is really need. 

If wee add to this solution the same number of buffering of dynamics geometry. We resolved first point from list on end of last post:

"It [White Rabbit Engine] use dynamic geometry that update rApi buffers in meantime of game loop."

Then move to synchronized loading of resources. In Open GL it's not easy task (at last I didn't found nice solution to it) because we need to create/upload/destroy resources on the same thread where context was created. So we need to split all resources on two initialization:

  •  hi level - all engine structures 
  • low level - creation of real rApi resources.
Hi level initialization can be done in game thread but all low level initialization need to be done on main thread before rendering. So we just need to collect them and execute before rendering of frame.

White Rabbit Engine used for this purpose tasks put into ring buffer. Where tasks are:

  • all rApi operations i.e. create shader, upload texture, destroy texture ...
  • execution of rendering rApi render list.

So on end almost everything what is happening in main thread loop is task. Good thing coming from this is auto synchronization. There will be no situation when rendering will try to use data that is still not created. So we resolved second point :

"Textures and meshes can be load in any moment of game (preparation for streaming)."

And on end was left:

"There is no separation between rendering of load screen and rendering of game."

which is resolved by creating additional thread in meantime of loading. This thread only render loading screen with smaller frame rate. In this time game thread load map. 

White Rabbit Engine work on this system already 2/3 months. I like it because:
  • this solution is a lot more generic than my old one.
  • it group all rApi calls in one place which help in debugging of graphics.
  • it allowed on loading of resources in any moment.
  • it use a lot of continuous memory.
  • there is a lot of places where it can be improved (and I have few ideas but they need to wait a little longer). 
So this post end introduction to multithreaded rendering in White Rabbit Engine. When I will move on my normal computer I will try do some charts how everything changed after moving on this new system. But I still don't know exactly time when this will happen.

So till next post.


  1. Hey Greg!
    Another interesting post of yours. I'm only curious how much latency is introduced with double buffering of render Api calls? I assume input is sampled in game update so when results of this frame is presented to user there is a delay between pressing a button and on screen effect of it. Have you thought about this issue?

  2. Hi Łukasz :]

    Yes you are right and I needed to think a little bit to answer on your question :] Lets start with some definitions so later be easier :

    u[x] - time of app updating tasks for frame x (where x == 0 is current frame)
    r[x] - time of rendering tasks for frame x (where x == 0 is current frame)

    For normal single threaded rendering this time should be:

    time_single = u[0]+r[0]

    In case of this solution with double buffering in worst case it should be :

    time_parallel = max(r[-1], u[0]) + r[0]

    So you are right that latency may be greater. But if the tasks will be balanced it should be almost the same as normal single threaded rendering. So it's not so bad :]

    Also I think that even if latency may be a little bigger the responsiveness of inputs should increase in multithreaded rendering. Because in one second there is a lot more input processing than in single threaded (they amount may increase even two times).


Post a Comment

Popular posts from this blog

Query commands execution

Hierarchy - UI improvement

W.U. 0x20