[slurm-users] Questions about adding new nodes to Slurm

Paul Edmon pedmon at cfa.harvard.edu
Tue Apr 27 18:51:18 UTC 2021


1. Part of the communications for slurm is hierarchical.  Thus nodes 
need to know about other nodes so they can talk to each other and 
forward messages to the slurmctld.

2. Yes, this is what we do.  We have our slurm.conf shared via NFS from 
our slurm master and then we just update that single conf.  After that 
update we then use salt to issue a global restart to all the slurmd's 
and slurmctld to pick up the new config.  scontrol reconfigure is not 
enough when adding new nodes, you have to issue a global restart.

3. It's pretty straight forward all told.  You just need to update the 
slurm.conf and do a restart.  You need to be careful that the names you 
enter into the slurm.conf are resolvable by DNS, else slurmctld may barf 
on restart.  Sadly no built in sanity checker exists that I am aware of 
aside from actually running slurmctld.  We got around this by putting 
together a gitlab runner which screens our slurm.conf's by running 
synthetic slurmctld to sanity check.

-Paul Edmon-

On 4/27/2021 2:35 PM, David Henkemeyer wrote:
> Hello,
>
> I'm new to Slurm (coming from PBS), and so I will likely have a few 
> questions over the next several weeks, as I work to transition my 
> infrastructure from PBS to Slurm.
>
> My first question has to do with *_adding nodes to Slurm_*.  According 
> to the FAQ (and other articles I've read), you need to basically shut 
> down slurm, update the slurm.conf file /*on all nodes in the 
> cluster*/, then restart slurm.
>
> - Why do all nodes need to know about all other nodes? From what I 
> have read, its Slurm does a checksum comparison of the slurm.conf file 
> across all nodes.  Is this the only reason all nodes need to know 
> about all other nodes?
> - Can I create a symlink that points <sysconfdir>/slurm.conf to a 
> slurm.conf file on an NFS mount point, which is mounted on all the 
> nodes? This way, I would only need to update a single file, then 
> restart Slurm across the entire cluster.
> - Any additional help/resources for adding/removing nodes to Slurm 
> would be much appreciated.  Perhaps there is a "toolkit" out there to 
> automate some of these operations (which is what I already have for 
> PBS, and will create for Slurm, if something doesn't already exist).
>
> Thank you all,
>
> David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210427/3a4f8a6e/attachment.htm>


More information about the slurm-users mailing list