On Mon, 2024-06-24 at 13:54:43 +0200, Slurm users wrote:
Dear Slurm users,
in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having:
PartitionName=SomePartition Nodes=master
or something similar. Apparently, this is not the way to do this as it is now a fatal error
fatal: Unable to determine this slurmd's NodeName
You're attempting to start the slurmd - which isn't required on this machine, as you say. Disable it. Keep slurmctld enabled (and declared in the config).
therefore, my *question:*
What is the best practice for excluding the master node from work?
Not defining it as a worker node.
I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED.
These states are slurmd states, and therefor meaningless for a machine that doesn't have a running slurmd. (It's the nodes that are defined in the config that are supposed to be able to run slurmd.)
So is *DRAINED* the correct setting in such a case?
Since this only applies to a node that has been defined in the config, and you (correctly) didn't do so, there's no need (and no means) to "drain" it.
Best Steffen