[slurm-users] [External] Submitting to multiple paritions problem with gres specified
pbisbal at pppl.gov
Mon Mar 8 21:02:22 UTC 2021
Rather than specifying the processor types as GRES, I would recommending
defining them as features of the nodes and let the users specify the
features as constraints to their jobs. Since the newer processors are
backwards compatible with the older processors, list the older
processors as features of the newer nodes, too.
For example, say you have some nodes that support AVX512, and some that
only support AVX2. node01 is older and supports only AVX2. Node02 is
newer and supports AVX512, but is backwards compatible and supports
AVX2. I would have something like this in my slurm.conf file:
NodeName=node01 Feature=avx2 ...
NodeName=node02 Feature=avx512,avx2 ...
I have a very hetergeneous cluster with several different generations of
AMD and Intel processors, we use this method quite effectively.
If you want to continue down the road you've already started on, can you
provide more information, like the partition definitions and the gres
definitions? In general, Slurm should support submitting to multiple
On 3/8/21 11:29 AM, Bas van der Vlies wrote:
> On this cluster I have version 20.02.6 installed. We have different
> partitions for cpu type and gpu types. we want to make it easy for the
> user who not care where there job runs and for the experienced user
> they can specify the gres type: cpu_type or gpu
> I have defined 2 cpu partitions:
> * cpu_e5_2650_v1
> * cpu_e5_2650_v2
> and 2 gres cpu_type:
> * e5_2650_v1
> * e5_2650_v2
> When no partitions are specified it will submit to both partitions:
> * srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash -->
> r16n18 wich has defined this gres and is in partition cpu_e5_2650_v1
> Now I submit at the same time another job:
> * srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash
> This fails with: `srun: error: Unable to allocate resources: Requested
> node configuration is not available`
> I would expect it gets queued in the partition `cpu_e5_2650_v1`.
> When I specify the partition on the command line:
> * srun --exclusive -p cpu_e5_2650_v1_shared
> --gres=cpu_type:e5_2650_v1 --pty /bin/bash
> srun: job 1856 queued and waiting for resources
> So the question is can slurm handle submitting to multiple partitions
> when we specify gres attributes?
More information about the slurm-users