[slurm-users] [External] Submitting to multiple paritions problem with gres specified

Tue Mar 9 13:21:19 UTC 2021

I have found the problem and will submit a patch. If we find a partition 
were a job can run but all nodes are busy. Save this state and return 
this when all partitions are checked and job can not run in any.

Do not know if this is the right approach

regards

On 09/03/2021 09:45, Bas van der Vlies wrote:
> Hi Prentice,
> 
> Ansers inline
> 
> On 08/03/2021 22:02, Prentice Bisbal wrote:
>> Rather than specifying the processor types as GRES, I would 
>> recommending defining them as features of the nodes and let the users 
>> specify the features as constraints to their jobs. Since the newer 
>> processors are backwards compatible with the older processors, list 
>> the older processors as features of the newer nodes, too.
>>
> We already do this with features on our other cluster. We assign nodes
> different feature and user select these. I can add a new feature of 
> which cpu type it is. Sometime you want avx512 and specific processor.
> 
> On other cluster we have 5 different GPU's and a lot of partitions. I 
> want to make it simple for our users. So we have a 'job_submit.lua' 
> script that submits to multiple parttions and if the user specify the 
> GRES type then slurm selects the right partition(s)
> 
> On this cluster we do not have GPU's but i can test with other GRES type
> 'cpu_type'. And I think the last partition in the list determines the 
> behavior. So if a use a GRES that is supported by the last partition the 
> job gets queued:
>   * srun -N1  --gres=cpu_type:e5_2650_v2 --pty /bin/bash
>   * srun --exclusive  --gres=cpu_type:e5_2650_v2 --pty /bin/bash
> srun: job 1865 queued and waiting for resources
> 
> So to me it seems that one of the partition is BUSY but can run the job. 
> I will test it on our GPU cluster but expect the same behaviour.
> 
> 
>>
>> If you want to continue down the road you've already started on, can 
>> you provide more information, like the partition definitions and the 
>> gres definitions? In general, Slurm should support submitting to 
>> multiple partitions.
> 
> slurm.conf
> ```PartitionName=cpu_e5_2650_v1 DefMemPerCPU=11000 Default=No 
> DefaultTime=5 DisableRootJobs=YES MaxNodes=2 MaxTime=5-00 
> Nodes=r16n[18-20] OverSubscribe=EXCLUSIVE QOS=normal State=UP
> 
> 
> PartitionName=cpu_e5_2650_v2 DefMemPerCPU=11000 Default=No DefaultTime=5 
> DisableRootJobs=YES MaxNodes=2 MaxTime=5-00 Nodes=r16n[21-22] 
> OverSubscribe=EXCLUSIVE QOS=normal State=UP
> 
> 
> NodeName=r16n18 CoresPerSocket=8 Features=sandybridge,sse4,avx 
> Gres=cpu_type:e5_2650_v1:no_consume:4T MemSpecLimit=1024 
> NodeHostname=r16n18.mona.surfsara.nl RealMemory=188000 Sockets=2 
> State=UNKNOWN ThreadsPerCore=1 Weight=10
> 
> NodeName=r16n21 CoresPerSocket=8 Features=sandybridge,sse4,avx 
> Gres=cpu_type:e5_2650_v2:no_consume:4T MemSpecLimit=1024 
> NodeHostname=r16n21.mona.surfsara.nl RealMemory=188000 Sockets=2 
> State=UNKNOWN ThreadsPerCore=1 Weight=10
> 
> gres.conf
> 
> NodeName=r16n[18-20] Count=4T Flags=CountOnly Name=cpu_type 
> Type=e5_2650_v1 NodeName=r16n[21-22] Count=4T Flags=CountOnly 
> Name=cpu_type Type=e5_2650_v2
> 
>>
>> Prentice
>>
>> On 3/8/21 11:29 AM, Bas van der Vlies wrote:
>>> Hi,
>>>
>>> On this cluster I have version 20.02.6 installed. We have different 
>>> partitions for cpu type and gpu types. we want to make it easy for 
>>> the user who not care where there job runs and for the experienced 
>>> user they can specify the gres type: cpu_type or gpu
>>>
>>> I have defined 2 cpu partitions:
>>>  * cpu_e5_2650_v1
>>>  * cpu_e5_2650_v2
>>>
>>> and 2 gres cpu_type:
>>>  * e5_2650_v1
>>>  * e5_2650_v2
>>>
>>>
>>> When no partitions are specified it will submit to both partitions:
>>>  * srun --exclusive  --gres=cpu_type:e5_2650_v1  --pty /bin/bash --> 
>>> r16n18 wich has defined this gres and is in partition cpu_e5_2650_v1
>>>
>>> Now I submit at the same time another job:
>>>  * srun --exclusive  --gres=cpu_type:e5_2650_v1  --pty /bin/bash
>>>
>>> This fails with: `srun: error: Unable to allocate resources: 
>>> Requested node configuration is not available`
>>>
>>> I would expect it gets queued in the partition `cpu_e5_2650_v1`.
>>>
>>>
>>> When I specify the partition on the command line:
>>>  * srun  --exclusive -p cpu_e5_2650_v1_shared 
>>> --gres=cpu_type:e5_2650_v1 --pty /bin/bash
>>>
>>> srun: job 1856 queued and waiting for resources
>>>
>>>
>>> So the question is can slurm handle submitting to multiple partitions 
>>> when we specify gres attributes?
>>>
>>> Regards
>>>
>>>
>>
> 

-- 
Bas van der Vlies
| HPCV Supercomputing | Internal Services  | SURF | 
https://userinfo.surfsara.nl |
| Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
|  bas.vandervlies at surf.nl