[slurm-users] Topology configuration questions:

Prentice Bisbal pbisbal at pppl.gov
Fri Jan 18 14:28:55 UTC 2019


On 01/17/2019 07:55 PM, Fulcomer, Samuel wrote:
> We use topology.conf to segregate architectures (Sandy->Skylake), and 
> also to isolate individual nodes with 1Gb/s Ethernet rather than IB 
> (older GPU nodes with deprecated IB cards). In the latter case, 
> topology.conf had a switch entry for each node.
So Slurm thinks each node has its own switch that is not shared with any 
other node?
>
> It used to be the case that SLURM was unhappy with nodes defined in 
> slurm.conf not appearing in topology.conf. This may have changed....
>
> On Thu, Jan 17, 2019 at 6:37 PM Ryan Novosielski <novosirj at rutgers.edu 
> <mailto:novosirj at rutgers.edu>> wrote:
>
>     I don’t actually know the answer to this one, but we have it
>     provisioned to all nodes.
>
>     Note that if you care about node weights (eg. NodeName=whatever001
>     Weight=2, etc. in slurm.conf), using the topology function will
>     disable it. I believe I was promised a warning about that in the
>     future in a conversation with SchedMD.
>
>     > On Jan 17, 2019, at 4:52 PM, Prentice Bisbal <pbisbal at pppl.gov
>     <mailto:pbisbal at pppl.gov>> wrote:
>     >
>     > And a follow-up question: Does topology.conf need to be on all
>     the nodes, or just the slurm controller? It's not clear from that
>     web page. I would assume only the controller needs it.
>     >
>     > Prentice
>     >
>     > On 1/17/19 4:49 PM, Prentice Bisbal wrote:
>     >> From https://slurm.schedmd.com/topology.html:
>     >>
>     >>> Note that compute nodes on switches that lack a common parent
>     switch can be used, but no job will span leaf switches without a
>     common parent (unless the TopologyParam=TopoOptional option is
>     used). For example, it is legal to remove the line "SwitchName=s4
>     Switches=s[0-3]" from the above topology.conf file. In that case,
>     no job will span more than four compute nodes on any single leaf
>     switch. This configuration can be useful if one wants to schedule
>     multiple phyisical clusters as a single logical cluster under the
>     control of a single slurmctld daemon.
>     >>
>     >> My current environment falls into the category of multiple
>     physical clusters being treated as a single logical cluster under
>     the control of a single slurmctld daemon. At least, that's my goal.
>     >>
>     >> In my environment, I have 2 "clusters" connected by their own
>     separate IB fabrics, and one "cluster" connected with 10 GbE. I
>     have a fourth cluster connected with only 1GbE. For this 4th
>     cluster, we don't want jobs to span nodes, due to the slow
>     performance of 1 GbE. (This cluster is intended for serial and
>     low-core count parallel jobs) If I just leave those nodes out of
>     the topology.conf file, will that have the desired affect of not
>     allocating multi-node jobs to those nodes, or will it result in an
>     error of some sort?
>     >>
>     >
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190118/760a1107/attachment.html>


More information about the slurm-users mailing list