[slurm-users] Question about using topology/tree

Antonio Lara antonio.lara at uam.es
Wed May 23 05:00:38 MDT 2018

Hi again,

Also, if I write the nodes names wrong in topology.conf (that is, nodes 
that are not specified in slurm.conf, in the line describing the 
partition, under "Nodes="), when I do scontrol reconfigure, there is no 
complaint or messages in the logs, so it seems like the topology.conf 
file might not be being processed (it has permissions 644 and is in the 
same directory as slurm.conf), but "scontrol show config" shows that the 
topology/tree plugin is loaded correctly.

Thank you

Best regards


El 23/05/18 a las 11:04, Antonio Lara escribió:
> Hello,
> I'm trying to use the topology/tree plugin to isolate nodes in 
> different "groups", so that jobs can be allocated only on nodes 
> belonging to one such group, and not in nodes from other groups. I 
> think I'm missing something, because Slurm doesn't seem to take this 
> topology into consideration. Hopefully someone can spot what I'm doing 
> wrong. So far I'm trying to separate a 3 node cluster into two groups, 
> using two switches. I have created a topology.conf file, and placed it 
> in the etc directory of the slurm installation path. This file 
> contains these two lines:
> SwitchName=s0 Nodes=node1,node2
> SwitchName=s1 Nodes=node3
> I also tried adding a third switch that contains the s0 and s1 
> switches, but it didn't solve anything.
> Then, I have enabled the use of the topology/tree plugin with this 
> line in slurm.conf:
> TopologyPlugin=topology/tree
> And finally these changes are taken into consideration with:
> scontrol reconfigure
> Then, I would expect that launching a job that requires 2 nodes would 
> be run on node1 and node2 only, since these two shoud be grouped under 
> the switch "s0", but it runs on node1 and node3, ignoring the 
> topology, when I send a command like this:
> sbatch -A molecules_serv -p cc -N 2 -n 4 --switches=1 ./script1
> Maybe I'm not understanding correctly what the --switches flag does, 
> but I think it should only consider nodes that are under 1 switch, and 
> among those, use those that can at the same time fullfill the other 
> requirements, like the number of nodes or tasks. Therefore I would 
> expect it to only run the job in node1 and node2 in parallel, but not 
> node3.
> Any ideas?
> Thank you very much for your help
> Antonio

