[slurm-users] [External] [slurm 20.02.3] don't suspend nodes in down state
Steininger, Herbert
herbert_steininger at psych.mpg.de
Tue Sep 1 06:38:10 UTC 2020
Hi Guys,
Thanks for your answers.
I would like not to patch the source code of Slurm, like Jacek does it, to make things easier.
But I think, it is the way to go.
When I try the solutions, Florian and Angelos suggested, slurm will still think that the nodes are "powered down", even if they not.
Well, it is better that slurm only thinks that they are down, better as if they will power down while upgrading something.
What we really need is some state like "MAINT", for maintenance, which will slurm tell, not to utilize the node but also don't power down the node.
Thanks,
Herbert
Von: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] Im Auftrag von Florian Zillner
Gesendet: Mittwoch, 26. August 2020 10:36
An: Slurm User Community List <slurm-users at lists.schedmd.com>
Betreff: Re: [slurm-users] [External] [slurm 20.02.3] don't suspend nodes in down state
Hi Herbert,
just like Angelos described, we also have logic in our poweroff script that checks if the node is really IDLE and only sends the poweroff command if that's the case.
Excerpt:
hosts=$(scontrol show hostnames $1)
for host in $hosts; do
scontrol show node $host | tr ' ' '\n' | grep -q 'State=IDLE+POWER$'
if [[ $? == 1 ]]; then
echo "node $host NOT IDLE" >>$OUTFILE
continue
else
echo "node $host IDLE" >>$OUTFILE
fi
ssh $host poweroff
...
sleep 1
...
done
Best,
Florian
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> on behalf of Steininger, Herbert <herbert_steininger at psych.mpg.de<mailto:herbert_steininger at psych.mpg.de>>
Sent: Monday, 24 August 2020 10:52
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: [External] [slurm-users] [slurm 20.02.3] don't suspend nodes in down state
Hi,
how can I prevent slurm, to suspend nodes, which I have set to down state for maintenance?
I know about "SuspendExcNodes", but this doesn't seem the right way, to roll out the slurm.conf every time this changes.
Is there a state that I can set so that the nodes doesn't get suspended?
It happened a few times that I was doing some stuff on a server and after our idle time (1h) slurm decided to suspend the node.
TIA,
Herbert
--
Herbert Steininger
Leiter EDV & HPC
Administrator
Max-Planck-Institut für Psychiatrie
Kraepelinstr. 2-10
80804 München
Tel +49 (0)89 / 30622-368
Mail herbert_steininger at psych.mpg.de<mailto:herbert_steininger at psych.mpg.de>
Web https://www.psych.mpg.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200901/cdbf46dd/attachment-0001.htm>
More information about the slurm-users
mailing list