Tutorial: Using PIX to diagnose spikes in CPU frame time
If dependencies between threads are involved, PIX can help developers diagnose spikes in frame time.
Recently, our team created a series of videos to introduce developers to PIX, the game-focused profiling tool from Microsoft for Windows games. PIX is the profiler that’s used by game studios to optimize the PC build of their 64-bit, C++, and DirectX 12 games. Developers can use PIX to solve lots of performance issues that are common in games, including issues related to the GPU, CPU, memory, or file IO.
In this month’s video, we’ll be teaching you how to use PIX Timing Captures to diagnose a common cause of spikes in CPU frame time: a dependency between two threads. When such a dependency occurs, a thread is forced to wait until another thread finishes some work before it can start running again. This delay is called a stall. If it’s long enough or happens frequently enough, it can result in a spike in frame time.
The best place to start is in the Summary layout of a Timing Capture that you’ve taken. When a capture is opened, PIX automates several common performance analysis tasks that you would often do manually. The Summary layout has the results of that analysis. PIX identifies things like the longest GPU frames and the longest-running PIX events. Looking for them is often how you find spikes in your CPU frames. The Summary layout also shows statistics such as CPU and memory usage.
In the video, I start our analysis by looking at the Longest PIX Events table. PIX events are defined by the game, and we'll look at them in detail in a bit. Looking at this table, I can see the top five longest-running events and the longest instances of each event. As the game developer, I know that these three “Frame *” events represent CPU frames. I looked through these longest instances previously and found that this third instance of the RHIThread event is a good example of a stall that is caused by a dependency between two threads.
Selecting the link took me to the timeline layout at the point in time when our spike occurred. Before we diagnose our spike, I'll spend a few minutes describing what we're looking at here. I'm going to zoom out a bit for context. Let’s break the timeline down with a bit more detail:
To start with, the x axis is time. The ribbon at the top of the timeline marks the elapsed time from the beginning to the end of the capture.
This Timeline view is organized primarily around horizontal lanes, with each lane representing a thread that is running in the game. In my example, the top lane is a thread named RHIThread. As I scroll down in the video, we see a render thread, a game thread, an audio thread, and so on. These thread names are set by calling the SetThreadDescription Win32 API.
Looking at RHIThread, there are groupings of colored boxes that all appear to have roughly the same color and shape. Each grouping represents a frame of CPU time in my game. Each of these colored boxes is a PIX event that is defined by the game. To define an event, the game calls the PIXBeginEvent API. It tells PIX when the event starts and stops, along with a color and a string to be drawn in the box. Each colored section of a PIX event box represents time when game code is running.
But there are also periods of time drawn in the gray hatched pattern. They represent when the thread is stalled, or isn't running, which is generally not good from a performance perspective. If a thread isn't running, it's not getting any work done. One reason this can happen is that a thread might have a dependency on some work that happens on another thread. In the case shown in the video, our thread is forced to wait until another thread completes its work before it can start running again.
As you can see, turning on the option to highlight context switches, named Bold Context Switches, makes it easier to see the boundaries of our stalls. You can also quickly find these boundaries by selecting the top event and looking at the Element Details view in the bottom-left corner. In this case, it tells me that my frame has stalled six times.
In my example, the large stall toward the end of the frame is of particular concern. To diagnose what caused this stall, we can select the context switch at the beginning of the stall and see that our thread went from running to not running, or idle. By looking at the call stack in the Element Details view we can see that RHIThread stopped running because it called the WaitForSingleObject Win32 API. RHIThread is waiting on a synchronization primitive. The more interesting context switch is the one that caused our thread to start running again. By selecting this context switch, the Element Details view gives us the information we need.
The key piece of information is that the main thread was made ready to run by a thread named RenderThread. By looking at the callstack for RenderThread, we can see that it called SetEvent, thereby signaling that the RHIThread can start running again. So that's the dependency. To make this really easy to see, PIX draws an arrow in the timeline between the two threads that have a dependency. When you remove thread dependency, RHIThread doesn’t stall, and the frame finishes sooner.
In the next video in this series, I’ll describe how to use the CPU sampling features in PIX to solve a different common cause of spikes in frame time.