[slurm-users] [External] Submitting to multiple paritions problem with gres specified
Ewan Roche
ewan.roche at unil.ch
Tue Mar 9 08:37:16 UTC 2021
Hello Ward,
as a variant on what has already been suggested we also have the CPU type as a feature:
Feature=E5v1,AVX
Feature=E5v1,AVX
Feature=E5v3,AVX,AVX2
Feature=S6g1,AVX,AVX2,AVX512
This allows people that want the same architecture and not just the same instruction set for a multi-node job can say:
sbatch —constraint=E5v1
Apart from multiple partitions approach another hack/workaround is to abuse the topology plugin to create fake switches with nodes of each CPU type connected and no links between these switches.
Switchname=sw0 Nodes=node[01-02,06-07]
Switchname=sw1 Nodes=node[03-05,08-10]
As there is no link between these “switches” Slurm will never schedule a job on node01 and node03.
Ewan Roche
Division Calcul et Soutien à la Recherche
UNIL | Université de Lausanne
> On 9 Mar 2021, at 09:16, Ward Poelmans <ward.poelmans at vub.be> wrote:
>
> Hi Prentice,
>
> On 8/03/2021 22:02, Prentice Bisbal wrote:
>
>> I have a very hetergeneous cluster with several different generations of
>> AMD and Intel processors, we use this method quite effectively.
>
> Could you elaborate a bit more and how you manage that? Do you force you
> users to pick a feature? What if a user submits a multi node job, can
> you make sure it will not start over a mix of avx512 and avx2 nodes?
>
>> If you want to continue down the road you've already started on, can you
>> provide more information, like the partition definitions and the gres
>> definitions? In general, Slurm should support submitting to multiple
>> partitions.
>
> As far as I understood it, you can give a comma separated list of
> partitions to sbatch but it's not possible to this by default?
>
> Ward
>
More information about the slurm-users
mailing list