[slurm-users] Question about using topology/tree

Antonio Lara antonio.lara at uam.es
Wed May 23 06:41:17 MDT 2018


Ok, never mind, I think I got it, it seems that scontrol reconfigure was 
not enough, I also had to restart the slurm daemons and it seems to work 
now.

Regards!


El 23/05/18 a las 13:00, Antonio Lara escribió:
> Hi again,
>
> Also, if I write the nodes names wrong in topology.conf (that is, 
> nodes that are not specified in slurm.conf, in the line describing the 
> partition, under "Nodes="), when I do scontrol reconfigure, there is 
> no complaint or messages in the logs, so it seems like the 
> topology.conf file might not be being processed (it has permissions 
> 644 and is in the same directory as slurm.conf), but "scontrol show 
> config" shows that the topology/tree plugin is loaded correctly.
>
> Thank you
>
> Best regards
>
> Antonio
>
>
> El 23/05/18 a las 11:04, Antonio Lara escribió:
>> Hello,
>>
>> I'm trying to use the topology/tree plugin to isolate nodes in 
>> different "groups", so that jobs can be allocated only on nodes 
>> belonging to one such group, and not in nodes from other groups. I 
>> think I'm missing something, because Slurm doesn't seem to take this 
>> topology into consideration. Hopefully someone can spot what I'm 
>> doing wrong. So far I'm trying to separate a 3 node cluster into two 
>> groups, using two switches. I have created a topology.conf file, and 
>> placed it in the etc directory of the slurm installation path. This 
>> file contains these two lines:
>>
>> SwitchName=s0 Nodes=node1,node2
>> SwitchName=s1 Nodes=node3
>>
>> I also tried adding a third switch that contains the s0 and s1 
>> switches, but it didn't solve anything.
>>
>> Then, I have enabled the use of the topology/tree plugin with this 
>> line in slurm.conf:
>>
>> TopologyPlugin=topology/tree
>>
>> And finally these changes are taken into consideration with:
>>
>> scontrol reconfigure
>>
>> Then, I would expect that launching a job that requires 2 nodes would 
>> be run on node1 and node2 only, since these two shoud be grouped 
>> under the switch "s0", but it runs on node1 and node3, ignoring the 
>> topology, when I send a command like this:
>>
>> sbatch -A molecules_serv -p cc -N 2 -n 4 --switches=1 ./script1
>>
>> Maybe I'm not understanding correctly what the --switches flag does, 
>> but I think it should only consider nodes that are under 1 switch, 
>> and among those, use those that can at the same time fullfill the 
>> other requirements, like the number of nodes or tasks. Therefore I 
>> would expect it to only run the job in node1 and node2 in parallel, 
>> but not node3.
>>
>> Any ideas?
>>
>> Thank you very much for your help
>>
>> Antonio
>>
>>
>
>




More information about the slurm-users mailing list