[slurm-users] GRES and GPUs
Xaver Stiensmeier
xaverstiensmeier at gmx.de
Mon Jul 17 13:43:32 UTC 2023
Hi Hermann,
Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.
To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?
Best regards,
Xaver Stiensmeier
On 17.07.23 14:11, Hermann Schwärzler wrote:
> Hi Xaver,
>
> what kind of SelectType are you using in your slurm.conf?
>
> Per https://slurm.schedmd.com/gres.html you have to consider:
> "As for the --gpu* option, these options are only supported by Slurm's
> select/cons_tres plugin."
>
> So you can use "--gpus ..." only when you state
> SelectType = select/cons_tres
> in your slurm.conf.
>
> But "--gres=gpu:1" should work always.
>
> Regards
> Hermann
>
>
> On 7/17/23 13:43, Xaver Stiensmeier wrote:
>> Hey,
>>
>> I am currently trying to understand how I can schedule a job that
>> needs a GPU.
>>
>> I read about GRES https://slurm.schedmd.com/gres.html and tried to use:
>>
>> GresTypes=gpu
>> NodeName=test Gres=gpu:1
>>
>> But calling - after a 'sudo scontrol reconfigure':
>>
>> srun --gpus 1 hostname
>>
>> didn't work:
>>
>> srun: error: Unable to allocate resources: Invalid generic resource
>> (gres) specification
>>
>> so I read more https://slurm.schedmd.com/gres.conf.html but that
>> didn't really help me.
>>
>>
>> I am rather confused. GRES claims to be generic resources but then it
>> comes with three defined resources (GPU, MPS, MIG) and using one of
>> those didn't work in my case.
>>
>> Obviously, I am misunderstanding something, but I am unsure where to
>> look.
>>
>>
>> Best regards,
>> Xaver Stiensmeier
>>
>
More information about the slurm-users
mailing list