[slurm-users] [External] Submitting to multiple paritions problem with gres specified

Prentice Bisbal pbisbal at pppl.gov
Mon Mar 8 21:02:22 UTC 2021


Rather than specifying the processor types as GRES, I would recommending 
defining them as features of the nodes and let the users specify the 
features as constraints to their jobs. Since the newer processors are 
backwards compatible with the older processors, list the older 
processors as features of the newer nodes, too.

For example, say you have some nodes that support AVX512, and some that 
only support AVX2. node01 is older and supports only AVX2. Node02 is 
newer and supports AVX512, but is backwards compatible and supports 
AVX2. I would have something like this in my slurm.conf file:

NodeName=node01 Feature=avx2 ...
NodeName=node02 Feature=avx512,avx2 ...

I have a very hetergeneous cluster with several different generations of 
AMD and Intel processors, we use this method quite effectively.

If you want to continue down the road you've already started on, can you 
provide more information, like the partition definitions and the gres 
definitions? In general, Slurm should support submitting to multiple 
partitions.

Prentice

On 3/8/21 11:29 AM, Bas van der Vlies wrote:
> Hi,
>
> On this cluster I have version 20.02.6 installed. We have different 
> partitions for cpu type and gpu types. we want to make it easy for the 
> user who not care where there job runs and for the experienced user 
> they can specify the gres type: cpu_type or gpu
>
> I have defined 2 cpu partitions:
>  * cpu_e5_2650_v1
>  * cpu_e5_2650_v2
>
> and 2 gres cpu_type:
>  * e5_2650_v1
>  * e5_2650_v2
>
>
> When no partitions are specified it will submit to both partitions:
>  * srun --exclusive  --gres=cpu_type:e5_2650_v1  --pty /bin/bash --> 
> r16n18 wich has defined this gres and is in partition cpu_e5_2650_v1
>
> Now I submit at the same time another job:
>  * srun --exclusive  --gres=cpu_type:e5_2650_v1  --pty /bin/bash
>
> This fails with: `srun: error: Unable to allocate resources: Requested 
> node configuration is not available`
>
> I would expect it gets queued in the partition `cpu_e5_2650_v1`.
>
>
> When I specify the partition on the command line:
>  * srun  --exclusive -p cpu_e5_2650_v1_shared 
> --gres=cpu_type:e5_2650_v1 --pty /bin/bash
>
> srun: job 1856 queued and waiting for resources
>
>
> So the question is can slurm handle submitting to multiple partitions 
> when we specify gres attributes?
>
> Regards
>
>



More information about the slurm-users mailing list