[slurm-users] slurm_state

sblock s.block at tu-berlin.de
Fri Mar 12 08:45:58 UTC 2021


Hello,

we had an outage of the cluster file system which also included the
slurm StateSaveLocation. Also slurm reported al jobs as orphan and then
setting the nodes DOWN because they were not responding.
After the file system was back user started to submit jobs, but the old
queue was gone.
Should slurm not use the old slurm_state when the filesystem is back?
What can we do to prevent loosing the queue again in such a situation?
The version is 17.11.5

Best regards,
 Sebastian
 

-- 
Sebastian Baldauf
HPC-Team


Technische Universität Berlin
Zentraleinrichtung Campusmanagement
Einsteinufer 17, 10587 Berlin
Telefon: +49 (0)30 314-74591
s.block at tu-berlin.de
www.campusmanagement.tu-berlin.de




More information about the slurm-users mailing list