How To: Reverse engineering a performance counter

In this example, we will study the warps_launched event which is quite simple.

Please make sure, you have the CUDA toolkit installed on your system and a CUDA sample compiled before to continue.

Step 1: Enable and configure the profiler

Enable the profiler :

export COMPUTE_PROFILE=1
export COMPUTE_PROFILE_CONFIG=perf_conf.txt

Configure the profiler :

# perf_conf.txt
warps_launched

Step 2: Take a trace with a modified version of valgrind-mmt

valgrind --tool=mmt --mmt-trace-file=/dev/nvidia0 --mmt-trace-nvidia-ioctls ./vectorAddDrv &> valgrind_mmt_trace.log

You can also take a look at the profiling output :

$ cat cuda_profile_0.log 
# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 GeForce GT 430
# CUDA_CONTEXT 1
# TIMESTAMPFACTOR fffff68311f26108
method,gputime,cputime,occupancy,warps_launched
method=[ memcpyHtoD ] gputime=[ 116.064 ] cputime=[ 69128.000 ] 
method=[ memcpyHtoD ] gputime=[ 116.032 ] cputime=[ 51292.000 ] 
method=[ VecAdd_kernel ] gputime=[ 67.008 ] cputime=[ 27084.000 ] occupancy=[ 1.000 ] warps_launched=[ 792 ] 
method=[ memcpyDtoH ] gputime=[ 189.120 ] cputime=[ 6512.000 ]

Step 3: Extract post ioctl calls of the trace and make it more user-friendly

grep RETURND valgrind_mmt_trace.log | cut -d ' ' -f2-

Now, the output looks like this :

RETURND: DIR=1 MMIO=504600 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=1 MMIO=504e00 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=0 MMIO=504600 VALUE=00000000 MASK=00000000 UNK=00000000,00000000,00000000,00000000
RETURND: DIR=1 MMIO=504600 VALUE=80000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504604 VALUE=00000026 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504608 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=50465c VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504660 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504664 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504668 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=50466c VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504730 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=100 MMIO=504674 VALUE=00000318 MASK=00000000 UNK=00000000,00000000,00000000,00000000
RETURND: DIR=100 MMIO=504670 VALUE=00000000 MASK=00000000 UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504674 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504678 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=50467c VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504680 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504684 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504688 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=50468c VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=101 MMIO=504690 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000
RETURND: DIR=0 MMIO=504600 VALUE=80000000 MASK=00000000 UNK=00000000,00000000,00000000,00000000
RETURND: DIR=1 MMIO=504600 VALUE=00000000 MASK=ffffffff UNK=00000000,00000000,00000000,00000000

Step 4: Use lookup (envytools) for printing register names

$ lookup -a NVC1 504604 26
PGRAPH.GPC[0].TP[0].MP.PM_SIGSEL[0] => { 0 = 0x26 | 1 = 0 | 2 = 0 | 3 = 0 }

$ lookup -a NVC1 504674 318
PGRAPH.GPC[0].TP[0].MP.PM_COUNTER[0] => 0x318

Step 5: Results
We can see that PCOUNTER selects the signal 0x26 and that the result is in the register 0x504674 (0x318 = 792). 🙂

To conclude, this method seems to work fine. However, it’s a bit annoying to do these steps for each events. So, I wrote a tool to make the reverse engineering process as automatic as possible.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s