Hi,
Among other clusters, I have a simple cluster with 2 nodes, running slurm.
1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am normally used to configure on a shared location, where there is only 1 copy of the config, I tried to change the current setup by moving the slurm-config to a shared (NFS) disk.
So on both nodes : /etc/slurm -> nfs-server:/shared/etc/slurm
But, I have tried several ways... The slurmctld and slurmd daemons will not start when I refer to this "nfs-shared" location.
Any ideas what is missing here ? Authorizations seem ok.
With Regards,
| Carel van der Werf | | Developer/Administrator Linux | ICT-Bèta | Department of Science | Utrecht University |
Hi Carel,
"Werf, C.G. van der (Carel)" C.G.vanderWerf@uu.nl writes:
Hi,
Among other clusters, I have a simple cluster with 2 nodes, running slurm.
1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am normally used to configure on a shared location, where there is only 1 copy of the config, I tried to change the current setup by moving the slurm-config to a shared (NFS) disk.
So on both nodes : /etc/slurm -> nfs-server:/shared/etc/slurm
But, I have tried several ways... The slurmctld and slurmd daemons will not start when I refer to this "nfs-shared" location.
Any ideas what is missing here ? Authorizations seem ok.
We have
$ ll -d /etc/slurm lrwxrwxrwx 1 root root 25 Feb 4 2019 /etc/slurm -> /trinity/shared/etc/slurm
and this works for us. Are you also using a link or actually trying to specify the NSF folder itself somewhere?
BTW, I think this kind of setup has been made obsolete by "Configless" Slurm:
https://slurm.schedmd.com/configless_slurm.html
Cheers,
Loris
On Wed, 2024-01-24 at 13:01:39 +0000, Werf, C.G. van der (Carel) wrote:
Hi,
Among other clusters, I have a simple cluster with 2 nodes, running slurm.
1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am normally used to configure on a shared location, where there is only 1 copy of the config, I tried to change the current setup by moving the slurm-config to a shared (NFS) disk.
So on both nodes : /etc/slurm -> nfs-server:/shared/etc/slurm
But, I have tried several ways... The slurmctld and slurmd daemons will not start when I refer to this "nfs-shared" location.
Do they start when you "systemctl start ..." them when the machine is up? If not there's a bigger problem. If yes, the service file(s) need to be modified, I'm using
After=network.target munge.service autofs.service
because my /home directories are automounted and /etc/slurm is pointing to /home/slurm/etc
Any ideas what is missing here ?
Quite probably a dependency between services.
Good luck, Steffen
On Wed, 2024-01-24 at 14:34:02 +0100, Steffen Grunewald wrote:
After=network.target munge.service autofs.service
Also, probably the more important change,
RequiresMountsFor=/home/slurm
because my /home directories are automounted and /etc/slurm is pointing to /home/slurm/etc
Apologies for the omission...
- S