[slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

Tim Schneider tim.schneider1 at tu-darmstadt.de
Thu Jan 4 18:01:14 UTC 2024


Hi,

I am using SLURM 22.05.9 on a small compute cluster. Since I reinstalled 
two of our nodes, I get the following error when launching a job:

slurmstepd: error: load_ebpf_prog: BPF load error (No space left on 
device). Please check your system limits (MEMLOCK).

Also the cgroups do not seem to work properly anymore, as I am able to 
see all GPUs even if I do not request them, which is not the case on the 
other nodes.

One difference I found between the updated nodes and the original nodes 
(both are Ubuntu 22.04) is the kernel version, which is 
"5.15.0-89-generic #99-Ubuntu SMP" on the functioning nodes and 
"5.15.0-91-generic #101-Ubuntu SMP" on the updated nodes. I could not 
figure out how to install the exact first kernel version on the updated 
nodes, but I noticed that when I reinstall 5.15.0 with this tool: 
https://github.com/pimlie/ubuntu-mainline-kernel.sh, the error message 
disappears. However, once I do that, the network driver does not 
function properly anymore, so this does not seem to be a good solution.

Has anyone seen this issue before or is there maybe something else I 
should take a look at? I am also happy to just find a workaround such 
that I can take these nodes back online.

I appreciate any help!

Thanks a lot in advance and best wishes,

Tim




More information about the slurm-users mailing list