In some cases, informations are not presently exposed through MMIO registers and the blob uses FIFO methods instead. Actually, the blob uses FIFO methods for enabling MP counters. Let start to explain how to do that.

In this example, I use the NVC1 chipset, and I want to decode the pushbuffer used by the NVC0_COMPUTE class (0x000090c0).

First you have to trace a signal using cupti_trace :

$ cupti_trace --trace NVC1 --event active_cycles

Now, you have to grep the FIFO object class id 0x000090c0.

$ grep 0x000090c0 active_cycles.trace
--6903-- out2 0x00000004 0x00000002 0x00000003 0x0000003d 0x0000003e 0x0000003f 0x00000040 0x00009197 0x000090b8 0x00000073 0x00005080 0x00009072 0x00009074 0x0000844c 0x000090dd 0x000090b2 0x000090b1 0x00008570 0x0000857a 0x0000857b 0x0000857c 0x0000857d 0x0000857e 0x0000007d 0x00009068 0x0000907f 0x0000906f 0x0000902d 0x00009097 0x000090c0 0x00009039 0x000090e0 0x000090e6 0x000090e2 0x000090e3 0x000050a0 0x00009096 0x000090e1 0x000090b3 0x000090b5 0x0000208a 0x000085b6 0x00009067 0x000090f1 0x0000503b 0x0000503c 0x00000075
--6903-- out2 0x00000004 0x00000002 0x00000003 0x0000003d 0x0000003e 0x0000003f 0x00000040 0x00009197 0x000090b8 0x00000073 0x00005080 0x00009072 0x00009074 0x0000844c 0x000090dd 0x000090b2 0x000090b1 0x00008570 0x0000857a 0x0000857b 0x0000857c 0x0000857d 0x0000857e 0x0000007d 0x00009068 0x0000907f 0x0000906f 0x0000902d 0x00009097 0x000090c0 0x00009039 0x000090e0 0x000090e6 0x000090e2 0x000090e3 0x000050a0 0x00009096 0x000090e1 0x000090b3 0x000090b5 0x0000208a 0x000085b6 0x00009067 0x000090f1 0x0000503b 0x0000503c 0x00000075
--6903-- pre_ioctl: fd:3, id:0x2b (full:0xc020462b), data: 0xc1d00511 0x5c0000c9 0x5c0000ca 0x000090c0 0x00000000 0x00000000 0x00000000 0x00000000
--6903-- post_ioctl: fd:3, id:0x2b (full:0xc020462b), data: 0xc1d00511 0x5c0000c9 0x5c0000ca 0x000090c0 0x00000000 0x00000000 0x00000000 0x00000000
--6903-- out 0x5c0000ca 0x000090c0 0x000090c0 0x00000001
--6903-- w 2:0x2004, 0x000090c0
--6903-- w 11:0x24300, 0x000090c3,0x000090c2,0x000090c1,0x000090c0
--6903-- w 9:0x24300, 0x000090c3,0x000090c2,0x000090c1,0x000090c0
--6903-- r 10:0x12180, 0x000090c6,0x000090c4,0x000090c2,0x000090c0
--6903-- pre_ioctl: fd:3, id:0x2b (full:0xc020462b), data: 0xc1d00511 0x5c0000ec 0x5c0000ed 0x000090c0 0x00000000 0x00000000 0x00000000 0x00000000
--6903-- post_ioctl: fd:3, id:0x2b (full:0xc020462b), data: 0xc1d00511 0x5c0000ec 0x5c0000ed 0x000090c0 0x00000000 0x00000000 0x00000000 0x00000000
--6903-- out 0x5c0000ed 0x000090c0 0x000090c0 0x00000001
--6903-- w 15:0x2004, 0x000090c0

The following line contains the map id, which is 2 in this example :

--6903-- w 2:0x2004, 0x000090c0

Now, you have to use dedma which decodes the pusbuffer using rnndb (the output is truncated here).

$ dedma -m c0 -v 2 active_cycles.trace > active_cycles.dedma
20014000 size 1, subchannel 2 (0x0), offset 0x0000, increment
000090c0 NVC0_COMPUTE mapped to subchannel 2
20014040 size 1, subchannel 2 (0x90c0), offset 0x0100, increment
00000000 NVC0_COMPUTE.GRAPH.NOP = 0
200141d6 size 1, subchannel 2 (0x90c0), offset 0x0758, increment
00000002 NVC0_COMPUTE.MP_LIMIT = 0x2
200141e4 size 1, subchannel 2 (0x90c0), offset 0x0790, increment
00000000 NVC0_COMPUTE.TEMP_ADDRESS_HIGH = 0
200141e5 size 1, subchannel 2 (0x90c0), offset 0x0794, increment
10000000 NVC0_COMPUTE.TEMP_ADDRESS_LOW = 0x10000000
200141e6 size 1, subchannel 2 (0x90c0), offset 0x0798, increment
00000000 NVC0_COMPUTE.TEMP_SIZE_HIGH = 0
200141e7 size 1, subchannel 2 (0x90c0), offset 0x079c, increment
00700000 NVC0_COMPUTE.TEMP_SIZE_LOW = 0x700000
200141e8 size 1, subchannel 2 (0x90c0), offset 0x07a0, increment
00012600 NVC0_COMPUTE.WARP_TEMP_ALLOC = 0x12600
200141df size 1, subchannel 2 (0x90c0), offset 0x077c, increment
03000000 NVC0_COMPUTE.LOCAL_BASE = 0x3000000
20014081 size 1, subchannel 2 (0x90c0), offset 0x0204, increment
000000f0 NVC0_COMPUTE.LOCAL_POS_ALLOC = 0xf0
20014082 size 1, subchannel 2 (0x90c0), offset 0x0208, increment
000007c0 NVC0_COMPUTE.LOCAL_NEG_ALLOC = 0x7c0
20014083 size 1, subchannel 2 (0x90c0), offset 0x020c, increment
00001000 NVC0_COMPUTE.WARP_CSTACK_SIZE = 0x1000
20014359 size 1, subchannel 2 (0x90c0), offset 0x0d64, increment
0000000f NVC0_COMPUTE.CALL_LIMIT_LOG = 0xf
200140c2 size 1, subchannel 2 (0x90c0), offset 0x0308, increment
00000003 NVC0_COMPUTE.CACHE_SPLIT = 48K_SHARED_16K_L1
20014085 size 1, subchannel 2 (0x90c0), offset 0x0214, increment
01000000 NVC0_COMPUTE.SHARED_BASE = 0x1000000
20014093 size 1, subchannel 2 (0x90c0), offset 0x024c, increment
00000000 NVC0_COMPUTE.SHARED_SIZE = 0
200140a8 size 1, subchannel 2 (0x90c0), offset 0x02a0, increment
00008000 NVC0_COMPUTE.UNK02A0 = 0x8000
2001408e size 1, subchannel 2 (0x90c0), offset 0x0238, increment
00010001 NVC0_COMPUTE.GRIDDIM_YX = { X = 1 | Y = 1 }
2001408f size 1, subchannel 2 (0x90c0), offset 0x023c, increment
00000001 NVC0_COMPUTE.GRIDDIM_Z = 1
200140eb size 1, subchannel 2 (0x90c0), offset 0x03ac, increment
00010001 NVC0_COMPUTE.BLOCKDIM_YX = { X = 1 | Y = 1 }
200140ec size 1, subchannel 2 (0x90c0), offset 0x03b0, increment
00000001 NVC0_COMPUTE.BLOCKDIM_Z = 1
200140b1 size 1, subchannel 2 (0x90c0), offset 0x02c4, increment
00000000 NVC0_COMPUTE.UNK02C4 = FALSE
...

However, dedma fails parsing when the blob uses method data from a different buffer, so you have to do that by hand but it’s pretty easy. You just have to find the data after the 0x20014cef address. In this example, I find 0xaaaa0 which is the value of MP_PM_OP https://github.com/pathscale/envytools/blob/master/rnndb/nvc0_compute.xml#L252.

See you! 😉