[slurm-users] [External] Submitting to multiple paritions problem with gres specified

Tue Mar 9 14:10:26 UTC 2021

For those who are interested:
  * https://bugs.schedmd.com/show_bug.cgi?id=11044

On 09/03/2021 14:21, Bas van der Vlies wrote:
> I have found the problem and will submit a patch. If we find a partition 
> were a job can run but all nodes are busy. Save this state and return 
> this when all partitions are checked and job can not run in any.
> 
> Do not know if this is the right approach
> 
> regards
> 
> On 09/03/2021 09:45, Bas van der Vlies wrote:
>> Hi Prentice,
>>
>> Ansers inline
>>
>> On 08/03/2021 22:02, Prentice Bisbal wrote:
>>> Rather than specifying the processor types as GRES, I would 
>>> recommending defining them as features of the nodes and let the users 
>>> specify the features as constraints to their jobs. Since the newer 
>>> processors are backwards compatible with the older processors, list 
>>> the older processors as features of the newer nodes, too.
>>>
>> We already do this with features on our other cluster. We assign nodes
>> different feature and user select these. I can add a new feature of 
>> which cpu type it is. Sometime you want avx512 and specific processor.
>>
>> On other cluster we have 5 different GPU's and a lot of partitions. I 
>> want to make it simple for our users. So we have a 'job_submit.lua' 
>> script that submits to multiple parttions and if the user specify the 
>> GRES type then slurm selects the right partition(s)
>>
>> On this cluster we do not have GPU's but i can test with other GRES type
>> 'cpu_type'. And I think the last partition in the list determines the 
>> behavior. So if a use a GRES that is supported by the last partition 
>> the job gets queued:
>>   * srun -N1  --gres=cpu_type:e5_2650_v2 --pty /bin/bash
>>   * srun --exclusive  --gres=cpu_type:e5_2650_v2 --pty /bin/bash
>> srun: job 1865 queued and waiting for resources
>>
>> So to me it seems that one of the partition is BUSY but can run the 
>> job. I will test it on our GPU cluster but expect the same behaviour.
>>
>>
>>>
>>> If you want to continue down the road you've already started on, can 
>>> you provide more information, like the partition definitions and the 
>>> gres definitions? In general, Slurm should support submitting to 
>>> multiple partitions.
>>
>> slurm.conf
>> ```PartitionName=cpu_e5_2650_v1 DefMemPerCPU=11000 Default=No 
>> DefaultTime=5 DisableRootJobs=YES MaxNodes=2 MaxTime=5-00 
>> Nodes=r16n[18-20] OverSubscribe=EXCLUSIVE QOS=normal State=UP
>>
>>
>> PartitionName=cpu_e5_2650_v2 DefMemPerCPU=11000 Default=No 
>> DefaultTime=5 DisableRootJobs=YES MaxNodes=2 MaxTime=5-00 
>> Nodes=r16n[21-22] OverSubscribe=EXCLUSIVE QOS=normal State=UP
>>
>>
>> NodeName=r16n18 CoresPerSocket=8 Features=sandybridge,sse4,avx 
>> Gres=cpu_type:e5_2650_v1:no_consume:4T MemSpecLimit=1024 
>> NodeHostname=r16n18.mona.surfsara.nl RealMemory=188000 Sockets=2 
>> State=UNKNOWN ThreadsPerCore=1 Weight=10
>>
>> NodeName=r16n21 CoresPerSocket=8 Features=sandybridge,sse4,avx 
>> Gres=cpu_type:e5_2650_v2:no_consume:4T MemSpecLimit=1024 
>> NodeHostname=r16n21.mona.surfsara.nl RealMemory=188000 Sockets=2 
>> State=UNKNOWN ThreadsPerCore=1 Weight=10
>>
>> gres.conf
>>
>> NodeName=r16n[18-20] Count=4T Flags=CountOnly Name=cpu_type 
>> Type=e5_2650_v1 NodeName=r16n[21-22] Count=4T Flags=CountOnly 
>> Name=cpu_type Type=e5_2650_v2
>>
>>>
>>> Prentice
>>>
>>> On 3/8/21 11:29 AM, Bas van der Vlies wrote:
>>>> Hi,
>>>>
>>>> On this cluster I have version 20.02.6 installed. We have different 
>>>> partitions for cpu type and gpu types. we want to make it easy for 
>>>> the user who not care where there job runs and for the experienced 
>>>> user they can specify the gres type: cpu_type or gpu
>>>>
>>>> I have defined 2 cpu partitions:
>>>>  * cpu_e5_2650_v1
>>>>  * cpu_e5_2650_v2
>>>>
>>>> and 2 gres cpu_type:
>>>>  * e5_2650_v1
>>>>  * e5_2650_v2
>>>>
>>>>
>>>> When no partitions are specified it will submit to both partitions:
>>>>  * srun --exclusive  --gres=cpu_type:e5_2650_v1  --pty /bin/bash --> 
>>>> r16n18 wich has defined this gres and is in partition cpu_e5_2650_v1
>>>>
>>>> Now I submit at the same time another job:
>>>>  * srun --exclusive  --gres=cpu_type:e5_2650_v1  --pty /bin/bash
>>>>
>>>> This fails with: `srun: error: Unable to allocate resources: 
>>>> Requested node configuration is not available`
>>>>
>>>> I would expect it gets queued in the partition `cpu_e5_2650_v1`.
>>>>
>>>>
>>>> When I specify the partition on the command line:
>>>>  * srun  --exclusive -p cpu_e5_2650_v1_shared 
>>>> --gres=cpu_type:e5_2650_v1 --pty /bin/bash
>>>>
>>>> srun: job 1856 queued and waiting for resources
>>>>
>>>>
>>>> So the question is can slurm handle submitting to multiple 
>>>> partitions when we specify gres attributes?
>>>>
>>>> Regards
>>>>
>>>>
>>>
>>
> 

-- 
Bas van der Vlies
| HPCV Supercomputing | Internal Services  | SURF | 
https://userinfo.surfsara.nl |
| Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
|  bas.vandervlies at surf.nl