[slurm-users] xcpuinfo_abs_to_mac: failed // cgroups v1 problem

Florian Zillner fzillner at lenovo.com
Thu Feb 9 09:09:18 UTC 2023


Hi,

I'm experiencing a strange issue related to a CPU swap (8352Y -> 6326) on two of our nodes. I adapted the slurm.conf to accommodate the new CPU:
slurm.conf: NodeName=ice27[57-58] CPUs=64 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 Realmemory=257550 MemSpecLimit=12000
which is also what slurmd -C autodetects: NodeName=ice2758 CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=257578

Slurm 22.05.7 (compiled from source)
Kernel: 4.18.0-372.32.1.el8_6.x86_64
OS: Rocky Linux release 8.6 (Green Obsidian)

All nodes boot the same OS image (PXE) and therefore have the same SW.

When I try to run a simple single node job (exclusive) on ice2758, the job immediately fails and the nodes is drained with "batch job complete failure". From the nodes slurmd.log:

# grep 224313 slurmd.ice2758.log | grep -v debug
[2023-02-08T18:09:35.026] Launching batch job 224313 for UID 1234502026
[2023-02-08T18:09:35.037] [224313.batch] task/affinity: init: task affinity plugin loaded with CPU mask 0xffffffffffffffff
[2023-02-08T18:09:35.037] [224313.batch] cred/mCPUs=64unge: init: Munge credential signature plugin loaded
[2023-02-08T18:09:35.049] [224313.batch] error: xcpuinfo_abs_to_mac: failed
[2023-02-08T18:09:35.049] [224313.batch] error: unable to build job physical cores
[2023-02-08T18:09:35.050] [224313.batch] task/cgroup: _memcg_initialize: job: alloc=245571MB mem.limit=245571MB memsw.limit=unlimited
[2023-02-08T18:09:35.050] [224313.batch] task/cgroup: _memcg_initialize: step: alloc=245571MB mem.limit=245571MB memsw.limit=unlimited
[2023-02-08T18:09:35.061] [224313.batch] starting 1 tasks
[2023-02-08T18:09:35.061] [224313.batch] task 0 (20552) started 2023-02-08T18:09:35
[2023-02-08T18:09:35.062] [224313.batch] error: common_file_write_uint32s: write pid 20552 to /sys/fs/cgroup/cpuset/slurm/uid_1234502026/job_224313/step_batch/cgroup.procs failed: No space left on device
[2023-02-08T18:09:35.062] [224313.batch] error: unable to add pids to '/sys/fs/cgroup/cpuset/slurm/uid_1234502026/job_224313/step_batch'
[2023-02-08T18:09:35.062] [224313.batch] error: task_g_pre_set_affinity: No space left on device
[2023-02-08T18:09:35.062] [224313.batch] error: _exec_wait_child_wait_for_parent: failed: Resource temporarily unavailable
[2023-02-08T18:09:36.065] [224313.batch] error: job_manager: exiting abnormally: Slurmd could not execve job
[2023-02-08T18:09:36.065] [224313.batch] job 224313 completed with slurm_rc = 4020, job_rc = 0
[2023-02-08T18:09:36.068] [224313.batch] done with job

There is plenty of space (= memory, bc PXE boot) available. lscgroup and cat /proc/cgroups shows far less than 1000 cgroups.

I then compared this to other nodes and what they are reporting when it comes to cgroups during job launch:
# grep -i "job abstract" slurmd*log | grep 2023-02-08
slurmd.banner2401.log:[2023-02-08T18:20:34.102] [224315.batch] debug:  task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0-31'
   slurmd.ice2701.log:[2023-02-08T18:27:05.391] [224319.batch] debug:  task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0-71'
   slurmd.ice2758.log:[2023-02-08T18:09:35.049] [224313.batch] debug:  task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0-63'

# psh banner2401,ice2701,ice2758 slurmd -C | grep -vi uptime
banner2401: NodeName=banner2401 CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=193090
   ice2701: NodeName=ice2701   CPUs=144 Boards=1 SocketsPerBoard=2 CoresPerSocket=36 ThreadsPerCore=2 RealMemory=257552
   ice2758: NodeName=ice2758    CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=257578

To me, it looks like slurmd -C is correctly detecting the CPUs, but when it comes to cgroups, the plugin somehow addresses all cores, even the HT ones, whereas on the other two nodes shown, the cgroups plugin is only addressing half, the real cores, of the node. A reboot does not fix this problem. We're happy with how slurm works for all the other nodes, just the two which had their CPUs changed are behaving differently. What am I missing here?

Cheers,
Florian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230209/0e72b1ca/attachment-0001.htm>


More information about the slurm-users mailing list