How to exclude master from computing? Set to DRAINED?
Dear Slurm users, in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having: PartitionName=SomePartition Nodes=master or something similar. Apparently, this is not the way to do this as it is now a fatal error fatal: Unable to determine this slurmd's NodeName therefore, my *question:* What is the best practice for excluding the master node from work? I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way to go. RESERVED fits with the second part "The node is in an advanced reservation and *not generally available*." and DRAINED "The node is unavailable for use per system administrator request." fits completely. So is *DRAINED* the correct setting in such a case? Best regards, Xaver
Dear Xaver, we have a similar setup and yes, we have set the node to "state=DRAIN". Slurm keeps it this way until you manually change it to e.g. "state=RESUME". Regards, Hermann On 6/24/24 13:54, Xaver Stiensmeier via slurm-users wrote:
Dear Slurm users,
in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having:
PartitionName=SomePartition Nodes=master
or something similar. Apparently, this is not the way to do this as it is now a fatal error
fatal: Unable to determine this slurmd's NodeName
therefore, my *question:*
What is the best practice for excluding the master node from work?
I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way to go. RESERVED fits with the second part "The node is in an advanced reservation and *not generally available*." and DRAINED "The node is unavailable for use per system administrator request." fits completely. So is *DRAINED* the correct setting in such a case?
Best regards, Xaver
On Mon, 2024-06-24 at 13:54:43 +0200, Slurm users wrote:
Dear Slurm users,
in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having:
PartitionName=SomePartition Nodes=master
or something similar. Apparently, this is not the way to do this as it is now a fatal error
fatal: Unable to determine this slurmd's NodeName
You're attempting to start the slurmd - which isn't required on this machine, as you say. Disable it. Keep slurmctld enabled (and declared in the config).
therefore, my *question:*
What is the best practice for excluding the master node from work?
Not defining it as a worker node.
I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED.
These states are slurmd states, and therefor meaningless for a machine that doesn't have a running slurmd. (It's the nodes that are defined in the config that are supposed to be able to run slurmd.)
So is *DRAINED* the correct setting in such a case?
Since this only applies to a node that has been defined in the config, and you (correctly) didn't do so, there's no need (and no means) to "drain" it. Best Steffen -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany ~~~ Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de ~~~
Thanks Steffen, that makes a lot of sense. I will just not start slurmd in the master ansible role when the master is not to be used for computing. Best regards, Xaver On 24.06.24 14:23, Steffen Grunewald via slurm-users wrote:
On Mon, 2024-06-24 at 13:54:43 +0200, Slurm users wrote:
Dear Slurm users,
in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having:
PartitionName=SomePartition Nodes=master
or something similar. Apparently, this is not the way to do this as it is now a fatal error
fatal: Unable to determine this slurmd's NodeName You're attempting to start the slurmd - which isn't required on this machine, as you say. Disable it. Keep slurmctld enabled (and declared in the config).
therefore, my *question:*
What is the best practice for excluding the master node from work? Not defining it as a worker node.
I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED. These states are slurmd states, and therefor meaningless for a machine that doesn't have a running slurmd. (It's the nodes that are defined in the config that are supposed to be able to run slurmd.)
So is *DRAINED* the correct setting in such a case? Since this only applies to a node that has been defined in the config, and you (correctly) didn't do so, there's no need (and no means) to "drain" it.
Best Steffen
Hi Xaver, Xaver Stiensmeier via slurm-users <slurm-users@lists.schedmd.com> writes:
Dear Slurm users,
in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having:
PartitionName=SomePartition Nodes=master
or something similar. Apparently, this is not the way to do this as it is now a fatal error
fatal: Unable to determine this slurmd's NodeName
therefore, my question:
What is the best practice for excluding the master node from work?
I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way to go. RESERVED fits with the second part "The node is in an advanced reservation and not generally available." and DRAINED "The node is unavailable for use per system administrator request." fits completely. So is DRAINED the correct setting in such a case?
You just don't configure the head node in any partition. You are getting the error because you are starting 'slurmd' on the node, which implies you do want to run jobs there. Normally you would run only 'slurmctld' and possibly also 'slurmdbd' on your head node. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin
Dear Xaver, Could you clarify the function of what you call "master"? If it's the Slurm controller, i.e. running slurmctld: Why do you need slurmd running on it as well? Best, Stephan On 24.06.24 13:54, Xaver Stiensmeier via slurm-users wrote:
Dear Slurm users,
in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having:
PartitionName=SomePartition Nodes=master
or something similar. Apparently, this is not the way to do this as it is now a fatal error
fatal: Unable to determine this slurmd's NodeName
therefore, my *question:*
What is the best practice for excluding the master node from work?
I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way to go. RESERVED fits with the second part "The node is in an advanced reservation and *not generally available*." and DRAINED "The node is unavailable for use per system administrator request." fits completely. So is *DRAINED* the correct setting in such a case?
Best regards, Xaver
participants (5)
-
Hermann Schwärzler -
Loris Bennett -
Steffen Grunewald -
Stephan Roth -
Xaver Stiensmeier