[slurm-users] Topology configuration questions:

Ryan Novosielski novosirj at rutgers.edu
Thu Jan 17 23:33:47 UTC 2019


> On Jan 17, 2019, at 4:49 PM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
> 
> From https://slurm.schedmd.com/topology.html:
> 
>> Note that compute nodes on switches that lack a common parent switch can be used, but no job will span leaf switches without a common parent (unless the TopologyParam=TopoOptional option is used). For example, it is legal to remove the line "SwitchName=s4 Switches=s[0-3]" from the above topology.conf file. In that case, no job will span more than four compute nodes on any single leaf switch. This configuration can be useful if one wants to schedule multiple phyisical clusters as a single logical cluster under the control of a single slurmctld daemon.
> 
> My current environment falls into the category of multiple physical clusters being treated as a single logical cluster under the control of a single slurmctld daemon. At least, that's my goal.
> 
> In my environment, I have 2 "clusters" connected by their own separate IB fabrics, and one "cluster" connected with 10 GbE. I have a fourth cluster connected with only 1GbE. For this 4th cluster, we don't want jobs to span nodes, due to the slow performance of 1 GbE. (This cluster is intended for serial and low-core count parallel jobs) If I just leave those nodes out of the topology.conf file, will that have the desired affect of not allocating multi-node jobs to those nodes, or will it result in an error of some sort?

It will print a warning:

[2019-01-10T12:41:32.457] TOPOLOGY: warning -- no switch can reach all nodes through its descendants.Do not use route/topology

…which sort of makes it sound like it’s going to ignore the topology plugin, but I believe it works (and the documentation sure indicates it does).


--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190117/4b5f946f/attachment-0001.sig>


More information about the slurm-users mailing list