[slurm-users] Can't get node out of drain state

Dean Schulze dean.w.schulze at gmail.com
Fri Jan 24 03:09:31 UTC 2020

The problem turned out to be that I had Gres=gpu:gp100:1 on the NodeName
line for that node and it didn't have a gpu or a gres.conf.  Once I moved
that to the correct NodeName line in slurm.conf that node came out of the
drain state and became usable again.

Pretty strange that having a Gres= property on a node that doesn't have a
gpu would get it stuck in the drain state.

On Thu, Jan 23, 2020 at 2:34 PM Alex Chekholko <alex at calicolabs.com> wrote:

> Hey Dean,
> Does 'scontrol show node <nodename' give any "Reason:"?  You can also look
> at 'sinfo -R'.
> Make sure the relevant network ports are open:
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
> Also check that slurmd daemons on the compute nodes can talk to each other
> (not just to the master). e.g. bottom of
> https://slurm.schedmd.com/big_sys.html
> Regards,
> Alex
> On Thu, Jan 23, 2020 at 1:05 PM Dean Schulze <dean.w.schulze at gmail.com>
> wrote:
>> I've tried the normal things with scontrol (
>> https://blog.redbranch.net/2015/12/26/resetting-drained-slurm-node/),
>> but I have a node that will not come out of the drain state.
>> I've also done a hard reboot and tried again.  Are there any other
>> remedies?
>> Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200123/aaba483c/attachment.htm>

More information about the slurm-users mailing list