[slurm-users] GRES and GPUs

Groner, Rob rug262 at psu.edu
Mon Jul 17 13:57:09 UTC 2023

That would certainly do it.  If you look at the slurmctld log when it comes up, it will say that it's marking that node as invalid because it has less (0) gres resources then you say it should have.  That's because slurmd on that node will come up and say "What gres resources??"

For testing purposes,  you can just create a dummy file on the node, then in gres.conf, point to that file as the "graphics file" interface.  As long as you don't try to actually use it as a graphics file, that should be enough for that node to think it has gres/gpu resources.  That's what I do in my vagrant slurm cluster.


From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Xaver Stiensmeier <xaverstiensmeier at gmx.de>
Sent: Monday, July 17, 2023 9:43 AM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] GRES and GPUs

Hi Hermann,

Good idea, but we are already using `SelectType=select/cons_tres`. After
setting everything up again (in case I made an unnoticed mistake), I saw
that the node got marked STATE=inval.

To be honest, I thought I can just claim that a node has a gpu even if
it doesn't have one - just for testing purposes. Could this be the issue?

Best regards,
Xaver Stiensmeier

On 17.07.23 14:11, Hermann Schwärzler wrote:
> Hi Xaver,
> what kind of SelectType are you using in your slurm.conf?
> Per https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0<https://slurm.schedmd.com/gres.html> you have to consider:
> "As for the --gpu* option, these options are only supported by Slurm's
> select/cons_tres plugin."
> So you can use "--gpus ..." only when you state
> SelectType              = select/cons_tres
> in your slurm.conf.
> But "--gres=gpu:1" should work always.
> Regards
> Hermann
> On 7/17/23 13:43, Xaver Stiensmeier wrote:
>> Hey,
>> I am currently trying to understand how I can schedule a job that
>> needs a GPU.
>> I read about GRES https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PqvE6pL2sKSb6KxLngi0sbm6qhIv8MRYTmUM%2Bgq1hrI%3D&reserved=0<https://slurm.schedmd.com/gres.html> and tried to use:
>> GresTypes=gpu
>> NodeName=test Gres=gpu:1
>> But calling - after a 'sudo scontrol reconfigure':
>> srun --gpus 1 hostname
>> didn't work:
>> srun: error: Unable to allocate resources: Invalid generic resource
>> (gres) specification
>> so I read more https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fgres.conf.html&data=05%7C01%7Crug262%40psu.edu%7Cbc4b7775beae4d2e376c08db86cbfc7b%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638251982928987379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aCh8X6QtJpRlIWxo%2BQxL85CC%2FbIo6bDxAY%2Fd5B9khmE%3D&reserved=0<https://slurm.schedmd.com/gres.conf.html> but that
>> didn't really help me.
>> I am rather confused. GRES claims to be generic resources but then it
>> comes with three defined resources (GPU, MPS, MIG) and using one of
>> those didn't work in my case.
>> Obviously, I am misunderstanding something, but I am unsure where to
>> look.
>> Best regards,
>> Xaver Stiensmeier

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230717/c10b60f0/attachment.htm>

More information about the slurm-users mailing list