[slurm-users] slurmctld removing offline nodes

Joe Teumer joe.teumer at gmail.com
Tue Oct 25 18:42:16 UTC 2022


We noticed that the slurm controller will remove nodes that it cannot reach.
How can this be disabled?
We would like to see the nodes marked down/drain instead of the controller
removing the nodes from sinfo.

/var/log/slurm/slurmctld.log
[2022-10-25T13:10:01.500] debug:  Log file re-opened
[2022-10-25T13:10:01.589] error: get_addr_info: getaddrinfo() failed:
Temporary failure in name resolution
[2022-10-25T13:10:01.589] error: slurm_set_addr: Unable to resolve
"spg-ethx-f4ce"
[2022-10-25T13:10:01.589] error: slurm_get_port: Address family '0' not
supported
[2022-10-25T13:10:01.589] error: _set_slurmd_addr: failure on spg-ethx-f4ce

cat /etc/slurm/slurm.conf | grep -i f4ce
NodeName=spg-ethx-f4ce ...
PartitionName=debug spg-ethx-f4ce ...

No output in sinfo:
sinfo -N | grep f4ce
sinfo -R | grep f4ce

slurmd -V
slurm 21.08.0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221025/992f38dc/attachment.htm>


More information about the slurm-users mailing list