Dear slurm-user list,
as far as I understood it, the slurm.conf needs to be present on the master and on the workers at slurm.conf (if no other path is set via SLURM_CONF). However, I noticed that when adding a partition only in the master's slurm.conf, all workers were able to "correctly" show the added partition when calling sinfo on them.
Is the stored slurm.conf on every instance just a fallback for when connection is down or what is the purpose? The documentation only says: "This file should be consistent across all nodes in the cluster."
Best regards, Xaver
Xaver,
If you look at your slurmctld log, you likely end up seeing messages about each node's slurm.conf not being the same as that on the master.
So, yes, it can work temporarily, but unless there are some very specific settings done, issues will arise. The state you are in now, you will want to sync the config across all nodes and then 'scontrol reconfigure'
You may want to look into configless if you can set DNS entries and your config is basically monolithic or all parts are in /etc/slurm/
Brian Andrus
On 4/15/2024 2:55 AM, Xaver Stiensmeier via slurm-users wrote:
Dear slurm-user list,
as far as I understood it, the slurm.conf needs to be present on the master and on the workers at slurm.conf (if no other path is set via SLURM_CONF). However, I noticed that when adding a partition only in the master's slurm.conf, all workers were able to "correctly" show the added partition when calling sinfo on them.
Is the stored slurm.conf on every instance just a fallback for when connection is down or what is the purpose? The documentation only says: . "This file should be consistent across all nodes in the cluster."
Best regards, Xaver