[slurm-users] jobs stuck in "CG" state
Durai Arasan
arasan.durai at gmail.com
Fri Aug 20 08:31:40 UTC 2021
Hello!
We have a huge number of jobs stuck in CG state from a user who probably
wrote code with bad I/O. "scancel" does not make them go away. Is there a
way for admins to get rid of these jobs without draining and rebooting the
nodes. I read somewhere that killing the respective slurmstepd process will
do the job. Is this possible? Any other solutions? Also are there any
parameters in slurm.conf one can set to manage such situations better?
Best,
Durai
MPI Tübingen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210820/f34971c1/attachment-0001.htm>
More information about the slurm-users
mailing list