Usually to clear jobs like this you have to reboot the node they are on. That will then force the scheduler to clear them.
-Paul Edmon-
On 4/10/2024 2:56 AM, archisman.pathak--- via slurm-users wrote:
We are running a slurm cluster with version `slurm 22.05.8`. One of our users has reported that their jobs have been stuck at the completion stage for a long time. Referring to Slurm Workload Manager - Slurm Troubleshooting Guide we found that indeed the batchhost for the job was removed from the cluster, perhaps without draining it first.
How do we cancel/delete the jobs ?
- We tried scancel on the batch and individual job ids from both the user and from SlurmUser