GSoC 2014 – The clock is again ticking!

Hello,

The Google Summer of Code 2014 coding period starts tomorrow. This year, my project is to expose NVIDIA’s GPU graphics counter to the userspace through mesa. This idea follows my previous Google Summer of Code which was mainly focused on reverse engineering NVIDIA’s performance counters.

The main goal of this project is to help Linux developpers in identifying the performance bottleneck of OpenGL applications. At the end of this GSoC, NVIDIA’s GPU graphics counter for GeForce 8, 9 and 2XX (nv50/tesla) will (almost-all) be exposed for Nouveau. Some counters won’t be available until the compute support (ie. the ability to launch kernels) for nv50 is not implemented.

During the past weeks, I continued to reverse engineering NVIDIA’s graphics counter for nv50 until now. Currently, the documentation is almost complete (except for aa, ac and af because I don’t have them), and recently I started this process for nvc0 cards. At the moment this documentation hasn’t been pushed to envytools and it is only available in my personal repository.

For checking the reverse engineered configuration of the performance counters, I developed a modified version of OGLPerfHarness (the OpenGL sample code of NVPerfKit). This OpenGL sample automatically monitors and exports values of performance counters by using NVPerfSDK on Windows. The figure below shows an example.

openglharness-screenshot

This tool is called (using a bash script) for all available counters and it produces the following output (for shader_busy signal in this example) :

OPTIONS:
model=bunny
model-count=27
render-mode=vbo
texture=small
num-frames=100
fullscreen=no
STATS:
fps=9.53
mean=98.5%
min=98.5%
max=98.6%

All stats produced by the OpenGL sample are available in my repo. However, I didn’t publish the code because I don’t have the right to redistribute it, but I can send a patch if anyone is interested.

For checking the configuration of these performance counters on Nouveau, I ported my tool to Linux. Then, I was able to compare values exported from Windows using nv_perfmon for monitoring counters.

Now, the plan for the next weeks is to work on the kernel ioctls interface.

See you later!

Advertisements

Google Summer of Code 2014 – Proposal for X.Org Foundation

Title

Expose NVIDIA’s GPU graphics counters to the userspace.

Short description

This project aims to expose NVIDIA’s GPU graphics counters to the userspace through mesa. This idea follows my previous Google Summer of Code which was mainly focused on reverse engineering NVIDIA’s performance counters. The main goal of this project is to help Linux developpers in identifying the performance bottleneck of OpenGL applications.

Personal information

I’m a student in his final year of a MSc degree at the university of Bordeaux,
France. I already participated to the Google Summer of Code last year [1] and
my project was to reverse engineering NVIDIA’s performance counters.

Context

Performance counters

A hardware performance counter is a set of special registers which are used
to store the counts of hardware-related activities. Hardware counters are
oftenly used by developers to identify bottlenecks in their applications.

In this proposal, we are only focusing on NVIDIA’s performance counters.

There are two types of counters offered by NVIDIA which provide data directly
from various points of the GPU. Compute counters are used for OpenCL, while
graphics counters give detailed information for OpenGL/Direct3D.

On Windows, compute and graphics counters are both exposed by PerfKit[2], an
advanced software suite (except when it crashes my computer for no particular
reason :-)), which can be used by advanced users for profiling OpenCL and
Direct3D/OpenGL applications.

On Linux, the proprietary driver *only* exposes compute counters through the
CUDA compute profiler[3] (CUPTI), and not graphics counters like PerfKit which
is only available on Windows.

On Nouveau/Linux, some counters are already exposed. Compute counters for
nvc0/Fermi and nve0/Kepler are available in mesa which manages counters’
allocation and monitoring through some software methods provided by the kernel.

The compute and graphics counters distinction made by NVIDIA is arbitrary and
won’t be present in our re-implementation.

Google Summer of Code 2013 review

I took part in the GSoC 2013 and my project was to reverse engineering NVIDIA’s
performance counters and to expose them via nv_perfmon.

Let me now sum up the important tasks I have done during this project.

The first part I have done was to take a look at cupti to understand how GPU
compute counters are implemented on Fermi. After playing a bit with that
profiler, I wrote a tool named cupti_trace[4] to make the reverse engineering
process as automatic as possible. At this stage, I was able to start the
implementation of MP counters on nvc0/Fermi in mesa, based on the previous work
of Christoph Bumiller (aka calim) who already had implemented that support for
nve0/Kepler. To complete this task, I had to implement parts of the compute
runtime for nvc0 (ie. the ability to launch kernels).

MP compute counters support for Fermi :
http://lists.freedesktop.org/archives/mesa-commit/2013-July/044444.html
http://lists.freedesktop.org/archives/mesa-commit/2013-August/044573.html
http://lists.freedesktop.org/archives/mesa-commit/2013-August/044574.html
http://lists.freedesktop.org/archives/mesa-commit/2013-August/044576.html

The second part of my project was to start reverse engineering graphics
counters on nv50/Tesla through PerfKit and gDEBugger[5], an advanced OpenGL and
OpenCL debugger, profiler and memory analyzer. Knowing that PerfKit was only
available on Windows, I was unable to use envytools[6], a tools suite for
reverse engineering the NVIDIA proprietary driver because it depends on
libpciaccess which was not available on Windows. To complete this
task, I then ported this library by using WinIO in order to use tools provided
by envytools like nvapeek and nvawatch.

libpciaccess support on Windows/Cygwin:
https://hakzsam.wordpress.com/2014/01/28/libpciaccess-has-now-official-support-for-windowscygwin/
http://www.phoronix.com/scan.php?page=news_item&px=MTU4NTU
http://cgit.freedesktop.org/xorg/lib/libpciaccess/commit/?id=6bfccc7ec4f0705595385f6684b6849663f781b4

At the end of this Google Summer of Code, some graphics counters had already been
reverse engineered on nv98/Tesla.

This project has been successfully completed except for the implementation of
graphics counters in nv_perfmon and the reverse engineering of MP counters on
Tesla (regarding the schedule). And it has been a very interesting experience
for me even if that was very hard at the beginning. I’m now able to say that
low level hardware programming on GPU is not a trivial task -:).

After GSoC 2013 until now

From October to January, I didn’t work on Nouveau at all because I was
completely busy by the university work.

In February, I returned to work on the reverse engineering of these graphics
counters, and I mostly completed all the documentation of nv50/Tesla chipsets[7].

Project

Benefits to the community

Help Linux developpers in identifying the performance bottleneck of OpenGL
applications.

Description

Compute counters for nvc0+ are already exposed by Nouveau, but there are still
many performance counters exposed by NVIDIA that are left to be exposed in
Nouveau. Last year, I added compute counters support used by OpenCL and CUDA
for nvc0/Fermi.

Graphics counters are currently only available on Windows, but I reverse
engineered them and the documentation is mostly complete. At the time, nv50,
84, 86, 92, 98, a0, a3 and a5 are documented. In few days, I should be able to
complete this list by adding 94, 96 and a8 chipsets. In this GSoC project, I would like to
expose them in Nouveau but there is some problems between PCOUNTER[8] and MP
counters.

PCOUNTER is the card unit which contains most of the performance counters.
PCOUNTER is divided in 8 domains (or sets) on nv50/Tesla. Each domain has a
different source clock and has 255+ input signals that can themselves be the
output of one multiplexer. PCOUNTER uses global counters whereas MP counters
are per-app and context switched like compute counters used for nvc0+.

Actually, these two types of counters are not really independent and may share
some configuration parts, for example, the output of a signal multiplexer.

Because of the issue of shared configuration of global counters (PCOUNTER)
and local counters (MP counters), I think it’s a bad idea to allow monitoring
multiple applications concurrently. To solve this problem, I suggest, at first,
to use a global lock for allowing only one application at a time and
for simplifying the implementation.

NVIDIA does not handle this case at all, and the behaviour is thus undefined when more
than one application is monitoring performance counters at the same time.

Implementation

kernel interface and ioctls

Some performance counters are globals and have to be programmed through MMIO.
They have to be managed by the Linux Kernel using an ioctls interface that are
to be defined.

mesa

Only mesa should directly uses performance counters because it has all the
information to expose them. Mesa is able to allocate and manage MP
counters (per-app) and can also call the Kernel in order to program global
counters via the ioctls interface that will be implemented. At this stage, mesa
will be able to expose them in GL_AMD_performance_monitor and nouveau-perfkit.

GL_AMD_performance_monitor

GL_AMD_performance_monitor[9] is an OpenGL extension which can be used to
capture and report performance counters. This is a great extension for Linux
developers which currently does not report any performance counters from
NVIDIA’s GPU. After having the core implementation in mesa, this task should
not be too harder since I already have a branch[7] of mesa with core support of
GL_AMD_performance_monitor. Thanks to Kenneth Graunke and Christoph Bumiller.

nouveau-perfkit

Nouveau-perfkit will be a Linux/Nouveau version of NVPerfKit. This tool will be based
on mesa’s implementation. nouveau-perfkit will export both GPU graphics
counters (only nv50/Tesla in a first time) and compute counters (nvc0+). To
maintain interoperability with NVIDIA, I am thinking about re-using the
interface of NVidia’s NVPerfkit. This tool will be for nouveau only.

GSoC work

Required tasks:
– core implementation (kernel interface + ioctls + mesa)
– expose graphics counters through GL_AMD_performance_monitor
– add nouveau-perfkit a Linux version of NVPerfkit

Optionnal tasks (if I have the time):
– reverse engineering NVIDIA’s GPU graphics counters for Fermi and Kepler
– all the work which can be done around performance counters

Approximative schedule

(now until 19 May)
– complete the documentation of signals on nv50/tesla
– write OpenGL samples code to test these graphics counters
– test the reverse engineering on Nouveau (mostly done) and write piglit tests
– think more about the core implementation

(19 May until 18 July)
– core implementation of GPU graphics counters
(kernel interface + ioctls + mesa)

(18 July to 28 July)
– expose graphics counters through GL_AMD_performance_monitor

(28 July to 18 August)
– implement nouveau-perfkit based on mesa, which follows nv-perfkit interface

(after GSoC)
– As the last year, I’ll continue to work on Nouveau after the end of this
Google Summer of Code 2014 because I like this job, it’s fun -:).

Thank you for reading. Have a good days.

References

[1] https://hakzsam.wordpress.com/2013/05/27/google-summer-of-code-2013-proposal-for-x-org-foundation/
[2] https://developer.nvidia.com/nvidia-perfkit
[3] http://docs.nvidia.com/cuda/cupti/index.html
[4] https://github.com/hakzsam/re-pcounter-tools/tree/master/src
[5] http://www.gremedy.com/
[6] https://github.com/envytools/envytools
[7] https://github.com/hakzsam/re-pcounter-tools/tree/master/hwdocs/pcounter
[8] https://github.com/envytools/envytools/blob/master/hwdocs/pcounter/intro.rst
[9] https://www.opengl.org/registry/specs/AMD/performance_monitor.txt