10 Apr
2024
10 Apr
'24
7:56 a.m.
We are running a slurm cluster with version `slurm 22.05.8`. One of our users has reported that their jobs have been stuck at the completion stage for a long time. Referring to Slurm Workload Manager - Slurm Troubleshooting Guide we found that indeed the batchhost for the job was removed from the cluster, perhaps without draining it first. How do we cancel/delete the jobs ? * We tried scancel on the batch and individual job ids from both the user and from SlurmUser