[slurm-users] Topology configuration questions:

Ryan Novosielski novosirj at rutgers.edu
Fri Jan 18 18:13:55 UTC 2019



> On Jan 18, 2019, at 11:53 AM, Kilian Cavalotti <kilian.cavalotti.work at gmail.com> wrote:
> 
> On Fri, Jan 18, 2019 at 6:31 AM Prentice Bisbal <pbisbal at pppl.gov> wrote:
>>> Note that if you care about node weights (eg. NodeName=whatever001 Weight=2, etc. in slurm.conf), using the topology function will disable it. I believe I was promised a warning about that in the future in a conversation with SchedMD.
>> 
>> Well, that's going to be a big problem for me. One of the goals of me
>> overhauling our Slurm config is to take advantage of the node weighting
>> function to prioritize certain hardware over others in our very
>> heterogeneous cluster.
> 
> I've heard that too (that enabling the Topology plugin would disable
> node weighting), but I don't think it's accurate, both from the
> documentation and from observation.
> 
> The doc actually says (https://slurm.schedmd.com/topology.html)
> 
> """
> NOTE:Slurm first identifies the network switches which provide the
> best fit for pending jobs and then selectes the nodes with the lowest
> "weight" within those switches. If optimizing resource selection by
> node weight is more important than optimizing network topology then do
> NOT use the topology/tree plugin.
> """
> 
> So the Topology plugin does take precedence over the weighting
> algorithm, but it doesn't disable it, AFAIK. And for sites using
> disjoint networks, as we do, this is a sane behavior.

I’m not sure if that’s a change, or whether that was always the behavior, but as a practical matter, it still really defeats the node weight. We have a fully defined topology for two different clusters, and it happens that the switch with the smallest number of connected nodes has the most specialized equipment (usually the login nodes, a couple of high memory nodes, and a few CUDA nodes). If someone runs a single node job, the job will favor that switch. I can think of a few ways to work around that, I guess, but by default, the behavior seems to be roughly the inverse of the node weights.

--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190118/9945bf8a/attachment.sig>


More information about the slurm-users mailing list