slurm-config on NFS-volume
Hi, Among other clusters, I have a simple cluster with 2 nodes, running slurm. 1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am normally used to configure on a shared location, where there is only 1 copy of the config, I tried to change the current setup by moving the slurm-config to a shared (NFS) disk. So on both nodes : /etc/slurm -> nfs-server:/shared/etc/slurm But, I have tried several ways... The slurmctld and slurmd daemons will not start when I refer to this "nfs-shared" location. Any ideas what is missing here ? Authorizations seem ok. With Regards, | Carel van der Werf | | Developer/Administrator Linux | ICT-Bèta | Department of Science | Utrecht University |
Hi Carel, "Werf, C.G. van der (Carel)" <C.G.vanderWerf@uu.nl> writes:
Hi,
Among other clusters, I have a simple cluster with 2 nodes, running slurm.
1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am normally used to configure on a shared location, where there is only 1 copy of the config, I tried to change the current setup by moving the slurm-config to a shared (NFS) disk.
So on both nodes : /etc/slurm -> nfs-server:/shared/etc/slurm
But, I have tried several ways... The slurmctld and slurmd daemons will not start when I refer to this "nfs-shared" location.
Any ideas what is missing here ? Authorizations seem ok.
We have $ ll -d /etc/slurm lrwxrwxrwx 1 root root 25 Feb 4 2019 /etc/slurm -> /trinity/shared/etc/slurm and this works for us. Are you also using a link or actually trying to specify the NSF folder itself somewhere? BTW, I think this kind of setup has been made obsolete by "Configless" Slurm: https://slurm.schedmd.com/configless_slurm.html Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin
On Wed, 2024-01-24 at 13:01:39 +0000, Werf, C.G. van der (Carel) wrote:
Hi,
Among other clusters, I have a simple cluster with 2 nodes, running slurm.
1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am normally used to configure on a shared location, where there is only 1 copy of the config, I tried to change the current setup by moving the slurm-config to a shared (NFS) disk.
So on both nodes : /etc/slurm -> nfs-server:/shared/etc/slurm
But, I have tried several ways... The slurmctld and slurmd daemons will not start when I refer to this "nfs-shared" location.
Do they start when you "systemctl start ..." them when the machine is up? If not there's a bigger problem. If yes, the service file(s) need to be modified, I'm using After=network.target munge.service autofs.service because my /home directories are automounted and /etc/slurm is pointing to /home/slurm/etc
Any ideas what is missing here ?
Quite probably a dependency between services. Good luck, Steffen -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany ~~~ Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de ~~~
On Wed, 2024-01-24 at 14:34:02 +0100, Steffen Grunewald wrote:
After=network.target munge.service autofs.service
Also, probably the more important change, RequiresMountsFor=/home/slurm
because my /home directories are automounted and /etc/slurm is pointing to /home/slurm/etc
Apologies for the omission... - S
participants (3)
-
Loris Bennett -
Steffen Grunewald -
Werf, C.G. van der (Carel)