[slurm-users] Host not being a valid controller

Pär Lundö par.lundo at foi.se
Fri Jun 28 13:31:56 UTC 2019


Hi all slurm-experts!

Recently I managed to configure and install a version 19.05 of Slurm in 
Ubuntu 18.04 and Ubuntu 18.10.
I got it to run on my single node computer (a notebook)

Feeling a bit comfortable with this setup I tried to extrapolate this to 
an additional computer, say node1, in my network. I now have two nodes, 
node0 and node1. Node0 being the "SlurmctldHost" and node1 being part of 
a partition. The two nodes have identical copies of slurm.conf.
However when starting the "slurmctld" and "slurmd" at node1, I receive 
errors stating that this host (node1) is not a valid controller.

Both node0 and node1 have copies of /etc/hosts-file.
I can ping both node1 from node0 and node0 from node1.
Nodes have the munge.key, I checked it with the cksum-command.

Performing a manual start of slurmctld with a arguments of "-D -vvvvv", 
I receive the same errors as stated by the "systemctl status 
slurmctld"-command.

I also recieve an error stating that my MailProg is faulty, however the 
"MailProg" in slurm.conf is commented out, and I have no intention in 
using one.

I have searched documentation and previous posted question of this, but 
have not found a solution.

Any help is much appreciated, thank you!

Best regards,

Palle




More information about the slurm-users mailing list