[slurm-users] update_node / reason set to: slurm.conf / state set to DRAINED
Kevin Buckley
Kevin.Buckley at pawsey.org.au
Thu Nov 5 02:00:41 UTC 2020
We have had a couple of nodes enter a DRAINED state where scontrol
gives the reason as
Reason=slurm.conf
In looking at the SlurmCtlD log we see pairs of lines as follows
update_node: node nid00245 reason set to: slurm.conf
update_node: node nid00245 state set to DRAINED
A search of the interweb thing for "Reason=slurm.conf" and
"reason set to: slurm.conf" has so far proved too much for
my search-fu.
The slurm.conf files do match across the Cray nodes so it
seems unlikely that it would be a "something out of sync "
thing.
Other "update_node" actions seen in the logs , eg,
"reason set to: NHC-Admindown"
also put the node into a DRAINED state but then, that's to
be expected.
We upgraded to 20.02.5 a couple of days ago and all of the
occurences of the "reason set to: slurm.conf" message only
appear after the update.
Anyone else seen anything like this, espcially any of you who
have just gone 20.02.5?
Yours, diving into the source soon,
Kevin
--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre
More information about the slurm-users
mailing list