[slurm-users] How to get an estimate of job completion for planned maintenance?

Carsten Beyer beyer at dkrz.de
Sun Nov 7 12:45:10 UTC 2021


Hi Ahmad,

you could use squeue -h -t r --format="%i %e" | sort -k2 to get a list 
of all running jobs sorted by their endtime.

We use normaly a maintenance reservation with starttime of the 
mainenance (or with some leading time before) to get the system free of 
jobs. That make things easier, because if you drain your cluster no new 
jobs could start. With the reservation jobs with a shorter wallclock 
time could be backfilled till the reservation/maintenance starts. You 
can put the reservation anytime in the system but at least or before 
"<starttime maintenance> minus <longest MaxTime of partition>", e.g.

scontrol create reservation=<name> starttime=<starttime> 
duration=<duration>  user=root flags=maint nodes=ALL

Hope, that helps a little bit,

Carsten

-- 
Carsten Beyer
Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone:  +49 40 460094-221
Fax:    +49 40 460094-270
Email:  beyer at dkrz.de
URL:    http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784


Am 05.11.2021 um 23:16 schrieb Ahmad Khalifa:
> If I plan maintenance on a certain day, how long before that day 
> should I set the queue to drain mode?! Is there a way to estimate the 
> completion date / time of current running jobs?!
>
> Regards.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5316 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211107/e3e29819/attachment.bin>


More information about the slurm-users mailing list