[slurm-users] Slurmstepd errors

Williams, Jenny Avis jennyw at email.unc.edu
Fri Aug 7 05:28:05 UTC 2020

We ran into a similar error -- 

A response from schedmd:

Remediating steps until updates got us past this particular issue:
Check for "xcgroup_instantiate errors” and close nodes that show this in messages log. From the nodes listed here we close compute node hosts that show the error. A reboot clears the condition.

Running slurm 17.02.6 on a cray system and all of a sudden we have been receiving these message errors from slurmstepd.  Not sure what triggers this?

srun -N 4 -n 4 hostname
slurmstepd: error: task/cgroup: unable to add task[pid=903] to memory cg '(null)'
slurmstepd: error: task/cgroup: unable to add task[pid=50322] to memory cg '(null)'

The jobs seem to be running but this sort of just popped up for some reason.

Slurmctld(primary/backup) 1/2 are UP/UP

