[slurm-users] [External] Submitting to multiple paritions problem with gres specified
Bas van der Vlies
bas.vandervlies at surf.nl
Tue Mar 9 08:45:35 UTC 2021
Hi Prentice,
Ansers inline
On 08/03/2021 22:02, Prentice Bisbal wrote:
> Rather than specifying the processor types as GRES, I would recommending
> defining them as features of the nodes and let the users specify the
> features as constraints to their jobs. Since the newer processors are
> backwards compatible with the older processors, list the older
> processors as features of the newer nodes, too.
>
We already do this with features on our other cluster. We assign nodes
different feature and user select these. I can add a new feature of
which cpu type it is. Sometime you want avx512 and specific processor.
On other cluster we have 5 different GPU's and a lot of partitions. I
want to make it simple for our users. So we have a 'job_submit.lua'
script that submits to multiple parttions and if the user specify the
GRES type then slurm selects the right partition(s)
On this cluster we do not have GPU's but i can test with other GRES type
'cpu_type'. And I think the last partition in the list determines the
behavior. So if a use a GRES that is supported by the last partition the
job gets queued:
* srun -N1 --gres=cpu_type:e5_2650_v2 --pty /bin/bash
* srun --exclusive --gres=cpu_type:e5_2650_v2 --pty /bin/bash
srun: job 1865 queued and waiting for resources
So to me it seems that one of the partition is BUSY but can run the job.
I will test it on our GPU cluster but expect the same behaviour.
>
> If you want to continue down the road you've already started on, can you
> provide more information, like the partition definitions and the gres
> definitions? In general, Slurm should support submitting to multiple
> partitions.
slurm.conf
```PartitionName=cpu_e5_2650_v1 DefMemPerCPU=11000 Default=No
DefaultTime=5 DisableRootJobs=YES MaxNodes=2 MaxTime=5-00
Nodes=r16n[18-20] OverSubscribe=EXCLUSIVE QOS=normal State=UP
PartitionName=cpu_e5_2650_v2 DefMemPerCPU=11000 Default=No DefaultTime=5
DisableRootJobs=YES MaxNodes=2 MaxTime=5-00 Nodes=r16n[21-22]
OverSubscribe=EXCLUSIVE QOS=normal State=UP
NodeName=r16n18 CoresPerSocket=8 Features=sandybridge,sse4,avx
Gres=cpu_type:e5_2650_v1:no_consume:4T MemSpecLimit=1024
NodeHostname=r16n18.mona.surfsara.nl RealMemory=188000 Sockets=2
State=UNKNOWN ThreadsPerCore=1 Weight=10
NodeName=r16n21 CoresPerSocket=8 Features=sandybridge,sse4,avx
Gres=cpu_type:e5_2650_v2:no_consume:4T MemSpecLimit=1024
NodeHostname=r16n21.mona.surfsara.nl RealMemory=188000 Sockets=2
State=UNKNOWN ThreadsPerCore=1 Weight=10
gres.conf
NodeName=r16n[18-20] Count=4T Flags=CountOnly Name=cpu_type
Type=e5_2650_v1
NodeName=r16n[21-22] Count=4T Flags=CountOnly Name=cpu_type Type=e5_2650_v2
>
> Prentice
>
> On 3/8/21 11:29 AM, Bas van der Vlies wrote:
>> Hi,
>>
>> On this cluster I have version 20.02.6 installed. We have different
>> partitions for cpu type and gpu types. we want to make it easy for the
>> user who not care where there job runs and for the experienced user
>> they can specify the gres type: cpu_type or gpu
>>
>> I have defined 2 cpu partitions:
>> * cpu_e5_2650_v1
>> * cpu_e5_2650_v2
>>
>> and 2 gres cpu_type:
>> * e5_2650_v1
>> * e5_2650_v2
>>
>>
>> When no partitions are specified it will submit to both partitions:
>> * srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash -->
>> r16n18 wich has defined this gres and is in partition cpu_e5_2650_v1
>>
>> Now I submit at the same time another job:
>> * srun --exclusive --gres=cpu_type:e5_2650_v1 --pty /bin/bash
>>
>> This fails with: `srun: error: Unable to allocate resources: Requested
>> node configuration is not available`
>>
>> I would expect it gets queued in the partition `cpu_e5_2650_v1`.
>>
>>
>> When I specify the partition on the command line:
>> * srun --exclusive -p cpu_e5_2650_v1_shared
>> --gres=cpu_type:e5_2650_v1 --pty /bin/bash
>>
>> srun: job 1856 queued and waiting for resources
>>
>>
>> So the question is can slurm handle submitting to multiple partitions
>> when we specify gres attributes?
>>
>> Regards
>>
>>
>
--
Bas van der Vlies
| HPCV Supercomputing | Internal Services | SURF |
https://userinfo.surfsara.nl |
| Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
| bas.vandervlies at surf.nl
More information about the slurm-users
mailing list