Implement MP counters for nv50 (compute only)


As part of my Google Summer of Code project I implemented MP counters (for compute only) on nv50/tesla. This work follows the implementation of MP counters for nvc0/fermi I did the last year.

Compute counters are used by OpenCL while graphics counters are used to count hardware-related activities of OpenGL applications. The distinction between these two types of counters made by NVIDIA is arbitrary and won’t be present in my implementation. That’s why compute counters can also be used to give detailed information of OpenGL applications like the number of instructions processed per frame or the number of launched warps.

MP performance counters are local and per-context while performance counters, programmed through the PCOUNTER engine, are global. A MP counter is more accurate than a global counter because it counts hardware-related activities for each context separately while a global counter reports activities regardless of the context that generates it.

All of these MP counters have been reverse engineered using CUPTI, the NVIDIA CUDA profiling tools interface which only exposes compute counters. On nv50/tesla, CUPTI exposes 13 performance counters like instructions or warp_serialize. The nv50 family has 4 MP counters per TPC (Texture Processing Cluster).

Currently, this prototype implements an interface between the kernel and mesa which exposes these MP performance counters to the user through the Gallium HUD. Basically, this interface can configure and poll a counter using the push buffer and a set of software methods.

To configure a MP counter we use the command stream like the blob does. We have two methods, the first one is for configuring the counter (mode, signal, unit and logic operation) and the second one is just used to reinitialize the counter. Then, to select the group of the MP counter we have added a software method. To poll counters we use a notifier buffer object which is allocated along a channel. This notifier allows to communicate between the kernel and mesa. This approach has already been explained in my latest article.

To sum up, this prototype adds support for 13 performance counters on nv50/tesla. All of the code is available on my github account. If you are interested, you can take a look at the mesa and the nouveau code.

Have a good day.