On Wed, 2024-01-24 at 13:01:39 +0000, Werf, C.G. van der (Carel) wrote:
Hi,
Among other clusters, I have a simple cluster with 2 nodes, running slurm.
1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am normally used to configure on a shared location, where there is only 1 copy of the config, I tried to change the current setup by moving the slurm-config to a shared (NFS) disk.
So on both nodes : /etc/slurm -> nfs-server:/shared/etc/slurm
But, I have tried several ways... The slurmctld and slurmd daemons will not start when I refer to this "nfs-shared" location.
Do they start when you "systemctl start ..." them when the machine is up? If not there's a bigger problem. If yes, the service file(s) need to be modified, I'm using
After=network.target munge.service autofs.service
because my /home directories are automounted and /etc/slurm is pointing to /home/slurm/etc
Any ideas what is missing here ?
Quite probably a dependency between services.
Good luck, Steffen