[slurm-users] GPU devices mapping with job's cgroup in cgroups v2 using eBPF

Charles Hedrick hedrick at rutgers.edu
Tue Jan 23 20:43:05 UTC 2024


To see the specific GPU allocated, I think this will do it:

scontrol show job -d | grep -E "JobId=| GRES"

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Mahendra Paipuri <mahendra.paipuri at gmail.com>
Sent: Sunday, January 7, 2024 3:33 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [slurm-users] GPU devices mapping with job's cgroup in cgroups v2 using eBPF

Hello all,

Happy new year!

We have recently upgraded the cgroups on our SLURM cluster to v2. In cgroups v1, the interface `/devices.list` used to have the information of which device has been attached to that particular cgroup. From my understanding, cgroups v2 use eBPF to manage devices and so as SLURM to manage the GPUs.

I was looking for a way to be able to programatically determine the job cgroups to device mapping and I came across this thread (https://bugzilla.redhat.com/show_bug.cgi?id=1717396) which has a similar discussion in the context of VMs.

So, I have used `bpftool` to inspect the job cgroups. An example output:
```
# /tmp/bpftool cgroup list /sys/fs/cgroup/system.slice/slurmstepd.scope/job_1956132
ID       AttachType      AttachFlags     Name
```
When I add `effective` flag, I see the attached eBPF program

```
# /tmp/bpftool cgroup list /sys/fs/cgroup/system.slice/slurmstepd.scope/job_1956132 effective
ID       AttachType      Name
4197     cgroup_device   Slurm_Cgroup_v2
```
>From my understand, `effective` flag shows the inherited eBPF programs as well. So, my question is at which level of cgroups the eBPF program is attached? I tried to inspect at various levels but all of them returned none.

Then looking into translated byte code of eBPF program, I get the following
```
# /tmp/bpftool prog dump xlated id 4197
  0: (61) r2 = *(u32 *)(r1 +0)
  1: (54) w2 &= 65535
  2: (61) r3 = *(u32 *)(r1 +0)
  3: (74) w3 >>= 16
  4: (61) r4 = *(u32 *)(r1 +4)
  5: (61) r5 = *(u32 *)(r1 +8)
  6: (55) if r2 != 0x2 goto pc+4
  7: (55) if r4 != 0xc3 goto pc+3
  8: (55) if r5 != 0x0 goto pc+2
  9: (b7) r0 = 0
 10: (95) exit
 11: (55) if r2 != 0x2 goto pc+4
 12: (55) if r4 != 0xc3 goto pc+3
 13: (55) if r5 != 0x1 goto pc+2
 14: (b7) r0 = 0
 15: (95) exit
 16: (55) if r2 != 0x2 goto pc+4
 17: (55) if r4 != 0xc3 goto pc+3
 18: (55) if r5 != 0x2 goto pc+2
 19: (b7) r0 = 0
 20: (95) exit
 21: (55) if r2 != 0x2 goto pc+4
 22: (55) if r4 != 0xc3 goto pc+3
 23: (55) if r5 != 0x3 goto pc+2
 24: (b7) r0 = 1
 25: (95) exit
 26: (b7) r0 = 1
 27: (95) exit
```
>From the output, it is clear that GPU:3 (among 0,1,2,3) is the one that is attached to that job's cgroup.

However, I was looking for a way to dump eBPF maps that can directly provide the major, minor numbers and permissions of device as discussed in the comment (https://bugzilla.redhat.com/show_bug.cgi?id=1717396#c5). When I inspect eBPF program, I dont see any maps associated.

```
# /tmp/bpftool prog list id 4197
4197: cgroup_device  name Slurm_Cgroup_v2  tag 1a261c8a913ff67c  gpl
       loaded_at 2024-01-02T08:19:56+0100  uid 0
       xlated 224B  jited 142B  memlock 4096B
```
So, my second question is how can I get a similar information as `map dump` that can give us the device's major minor numbers directly instead of parsing the byte code from `prog dump`?

I am still discovering the eBPF ecosystem so if I am missing something very obvious, please let me know. I would really appreciate that.

Cheers!

Regards
Mahendra

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240123/72328221/attachment-0001.htm>


More information about the slurm-users mailing list