[slurm-users] Preemption not working in 20.11

Prentice Bisbal pbisbal at pppl.gov
Fri Feb 26 19:35:53 UTC 2021


We recently upgraded from Slurm 19.05.8 to 20.11.3. In our 
configuration, we have an interruptible partition named 'interruptible' 
for long-running, low-priority jobs that use checkpoint/restart. Jobs 
that are preempted would be killed and requeued rather than suspended. 
This configuration has been working without issue for 2+ years without 
issue.

After the upgrade, this has stopped working. Preempted jobs are killed 
and not requeued. My slurm.conf file is configured to requeue preempted 
jobs:

$ grep -i requeue /etc/slurm/slurm.conf
#JobRequeue=1
PreemptMode=Requeue

And the user's sbatch script included the --requeue option.

The user reports the err output from his preempted jobs now says

slurmstepd: error: *** STEP 1075117.0 ON greene002 CANCELLED AT 
2021-02-25T16:07:48 ***

And in the past it would see PREEMPTED instead of cancelled.

Any ideas what would cause this? I've reported this to Slurm support, 
and haven't gotten anything back yet, so I figured I'd ask here, too. If 
this is a bug, I can't be the only one who has experienced this.

-- 
Prentice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210226/07b59a63/attachment.htm>


More information about the slurm-users mailing list