[slurm-users] Users can't scancel
William Markuske
wmarkuske at sdsc.edu
Wed Nov 18 17:00:58 UTC 2020
Hello,
I am having an odd problem where users are unable to kill their jobs
with scancel. Users can submit jobs just fine and when the task
completes it is able to close correctly. However, if a user attempts to
cancel a job via scancel the SIGKILL signals are sent to the step but
don't complete. Slurmd then continues to send SIGKILL requests until the
UnkillableTimeout is hit, the slurm job is exits with an error, the node
enters a draining state, and the spawn processes continue to run on the
node.
I'm at a loss because jobs can complete without issue which seems to
suggest it's not a networking or permissions issue for the slurm to do
job accounting tasks. A user can ssh to the node once a job is submitted
and kill the subprocesses manually at which point slurm completes the
epilog and the node returns to idle.
Does anyone know what may be causing such behavior? Please let me know
any slurm.conf or cgroup.conf settings that would be helpful to diagnose
this issue. I'm quite stumped by this one.
--
Willy Markuske
HPC Systems Engineer
Research Data Services
P: (858) 246-5593
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201118/585ea8c1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SDSClogo-plusname-red.jpg
Type: image/jpeg
Size: 9464 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201118/585ea8c1/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xD42F81D406AC0BA2.asc
Type: application/pgp-keys
Size: 3228 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201118/585ea8c1/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201118/585ea8c1/attachment.sig>
More information about the slurm-users
mailing list