<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">To see the specific GPU allocated, I think this will do it:</span></div>
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">scontrol show job -d | grep -E "JobId=| GRES"</span></div>
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
</span></div>
<hr style="display: inline-block; width: 98%;">
<div id="divRplyFwdMsg" dir="ltr"><span style="font-family: Calibri, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Mahendra Paipuri <mahendra.paipuri@gmail.com><br>
<b>Sent:</b> Sunday, January 7, 2024 3:33 PM<br>
<b>To:</b> slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> [slurm-users] GPU devices mapping with job's cgroup in cgroups v2 using eBPF</span>
<div> </div>
</div>
<div style="direction: ltr;">Hello all,</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Happy new year!</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">We have recently upgraded the cgroups on our SLURM cluster to v2. In cgroups v1, the interface `/devices.list` used to have the information of which device has been attached to that particular cgroup. From my understanding, cgroups
v2 use eBPF to manage devices and so as SLURM to manage the GPUs.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">I was looking for a way to be able to programatically determine the job cgroups to device mapping and I came across this thread (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1717396" id="OWAfb8995a0-1ac3-0299-330b-cc4cd1e17056" class="OWAAutoLink" data-auth="NotApplicable" data-loopstyle="linkonly">https://bugzilla.redhat.com/show_bug.cgi?id=1717396</a>)
which has a similar discussion in the context of VMs.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">So, I have used `bpftool` to inspect the job cgroups. An example output:</div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;"><span style="font-family: monospace; color: rgb(0, 0, 0);"># /tmp/bpftool cgroup list /sys/fs/cgroup/system.slice/slurmstepd.scope/job_1956132</span><span style="font-family: monospace;"><br>
ID AttachType AttachFlags Name</span></div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;">When I add `effective` flag, I see the attached eBPF program</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;"><span style="font-family: monospace; color: rgb(0, 0, 0);"># /tmp/bpftool cgroup list /sys/fs/cgroup/system.slice/slurmstepd.scope/job_1956132 effective</span><span style="font-family: monospace;"><br>
ID AttachType Name <br>
4197 cgroup_device Slurm_Cgroup_v2</span></div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;">From my understand, `effective` flag shows the inherited eBPF programs as well. So, my question is at which level of cgroups the eBPF program is attached? I tried to inspect at various levels but all of them returned none.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Then looking into translated byte code of eBPF program, I get the following</div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;"><span style="font-family: monospace; color: rgb(0, 0, 0);"># /tmp/bpftool prog dump xlated id 4197</span><span style="font-family: monospace;"><br>
0: (61) r2 = *(u32 *)(r1 +0)<br>
1: (54) w2 &= 65535<br>
2: (61) r3 = *(u32 *)(r1 +0)<br>
3: (74) w3 >>= 16<br>
4: (61) r4 = *(u32 *)(r1 +4)<br>
5: (61) r5 = *(u32 *)(r1 +8)<br>
6: (55) if r2 != 0x2 goto pc+4<br>
7: (55) if r4 != 0xc3 goto pc+3<br>
8: (55) if r5 != 0x0 goto pc+2<br>
9: (b7) r0 = 0<br>
10: (95) exit<br>
11: (55) if r2 != 0x2 goto pc+4<br>
12: (55) if r4 != 0xc3 goto pc+3<br>
13: (55) if r5 != 0x1 goto pc+2<br>
14: (b7) r0 = 0<br>
15: (95) exit<br>
16: (55) if r2 != 0x2 goto pc+4<br>
17: (55) if r4 != 0xc3 goto pc+3<br>
18: (55) if r5 != 0x2 goto pc+2<br>
19: (b7) r0 = 0<br>
20: (95) exit<br>
21: (55) if r2 != 0x2 goto pc+4<br>
22: (55) if r4 != 0xc3 goto pc+3<br>
23: (55) if r5 != 0x3 goto pc+2<br>
24: (b7) r0 = 1<br>
25: (95) exit<br>
26: (b7) r0 = 1<br>
27: (95) exit</span></div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;">From the output, it is clear that GPU:3 (among 0,1,2,3) is the one that is attached to that job's cgroup.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">However, I was looking for a way to dump eBPF maps that can directly provide the major, minor numbers and permissions of device as discussed in the comment (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1717396#c5" id="OWA10e1c608-8c54-e75c-bc0b-0829d030b639" class="OWAAutoLink" data-auth="NotApplicable" data-loopstyle="linkonly">https://bugzilla.redhat.com/show_bug.cgi?id=1717396#c5</a>).
When I inspect eBPF program, I dont see any maps associated.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;"><span style="font-family: monospace; color: rgb(0, 0, 0);"># /tmp/bpftool prog list id 4197</span><span style="font-family: monospace;"><br>
4197: cgroup_device name Slurm_Cgroup_v2 tag 1a261c8a913ff67c gpl<br>
loaded_at 2024-01-02T08:19:56+0100 uid 0<br>
xlated 224B jited 142B memlock 4096B</span></div>
<div style="direction: ltr;">```</div>
<div style="direction: ltr;">So, my second question is how can I get a similar information as `map dump` that can give us the device's major minor numbers directly instead of parsing the byte code from `prog dump`? </div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">I am still discovering the eBPF ecosystem so if I am missing something very obvious, please let me know. I would really appreciate that.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Cheers!</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Regards</div>
<div style="direction: ltr;"><span style="color: rgb(136, 136, 136);">Mahendra</span></div>
<div style="direction: ltr;"><br>
</div>
</body>
</html>