Hi all,
I'm having some trouble getting Slurm 24.11.6 to work with MIG, and the slurmd logs seem to point to an issue with eBPF. For some context, this is an LXD unprivileged container where I'm trying to get MIG to work with Slurm. Other compute nodes without MIG work fine and isolate the GPUs accordingly.
What I'm seeing in slurmd logs: [2025-11-24T23:32:50.197] [331.interactive] cgroup/v2: cgroup_p_constrain_apply: CGROUP: EBPF Closing and loading bpf program into /sys/fs/cgroup/system.slice/slurmstepd.scope/job_331 [2025-11-24T23:32:50.197] [331.interactive] error: load_ebpf_prog: BPF load error (Operation not permitted). Please check your system limits (MEMLOCK).
I've tried increasing the system limits for MEMLOCK by setting DefaultLimitMEMLOCK=infinity in /etc/systemd/system.conf, and I've copied my slurmd.service file below where I've set Delegate=yes and LimitMEMLOCK=infinity. Previously only Delegate=yes wasn't set (I had rifled through the cgroupv2 documentaton for Slurm and found that setting), but in both cases I see the same BPF load error.
Just wondering if this was something that other people had come across before and maybe I'm doing something silly here. I've checked that my slurm.conf has the corresponding parameters set according to Slurm's own documentation for cgroup.conf and my cgroup.conf is also copied below.
Some portion of the gres.conf is also copied below, and even though I tried AutoDetect=nvml for this node, it's still doesn't work, which was why I changed to manually setting it based off the output of slurmd -G.
Maybe I should try switching back to cgroupv1 and see if that helps fix things, but I'm not sure at this point if MIG and Slurm are compatible using cgroupv1.
I can send other parts of logs, configuration files etc. Any help would be greatly appreciated!
###### slurmd.service file [Unit] Description=Slurm node daemon After=network.target munge.service ConditionPathExists=/etc/slurm/slurm.conf
[Service] Type=forking EnvironmentFile=-/etc/sysconfig/slurmd ExecStart=/usr/sbin/slurmd -d /usr/sbin/slurmstepd $SLURMD_OPTIONS ExecReload=/bin/kill -HUP $MAINPID PIDFile=/var/run/slurmd.pid KillMode=process LimitNOFILE=51200 Delegate=yes LimitMEMLOCK=infinity LimitSTACK=infinity
[Install] WantedBy=multi-user.target
###### cgroup.conf CgroupPlugin=autodetect ConstrainCores=yes ConstrainDevices=yes ConstrainRAMSpace=yes
##### gres.conf NodeName=gpu-3 AutoDetect=nvml Name=gpu NodeName=gpu-4 Name=gpu MultipleFiles=/dev/nvidia0,/dev/nvidia-caps/nvidia-cap30,/dev/nvidia-caps/nvidia-cap31