[slurm-users] How to get an estimate of job completion for planned maintenance?

Diego Zuccato diego.zuccato at unibo.it
Mon Nov 8 11:48:12 UTC 2021


Hi.

I usually create a maintenance reservation with IGNORE_JOBS flag, so I 
can avoid new jobs interfering with it. Then I'll contact job owners to 
warn 'em I'll kill their jobs if needed.
Actually that's useful only for nodes that allow unlimited time jobs: 
for the others it's sufficient to plan in advance (if max run time is 
24h, then the reservation should be created more than 24h in advance).

Just my $.02

Diego

Il 07/11/2021 13:45, Carsten Beyer ha scritto:
> Hi Ahmad,
> 
> you could use squeue -h -t r --format="%i %e" | sort -k2 to get a list 
> of all running jobs sorted by their endtime.
> 
> We use normaly a maintenance reservation with starttime of the 
> mainenance (or with some leading time before) to get the system free of 
> jobs. That make things easier, because if you drain your cluster no new 
> jobs could start. With the reservation jobs with a shorter wallclock 
> time could be backfilled till the reservation/maintenance starts. You 
> can put the reservation anytime in the system but at least or before 
> "<starttime maintenance> minus <longest MaxTime of partition>", e.g.
> 
> scontrol create reservation=<name> starttime=<starttime> 
> duration=<duration>  user=root flags=maint nodes=ALL
> 
> Hope, that helps a little bit,
> 
> Carsten
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



More information about the slurm-users mailing list