[slurm-users] Use gres to handle permissions of /dev/dri/card* and /dev/dri/renderD*?

Martin Pecka peci1 at seznam.cz
Fri Jan 7 10:30:08 UTC 2022


Maybe I have good news, Stephan (and others). I discovered SLURM 20.11 
added a MultipleFiles option to gres.conf, which replaces File=. There 
are no docs about it yet, but I found a (possibly) working snippet 
making use of this option here: 
https://bugs.schedmd.com/show_bug.cgi?id=11091#c13 .

So my guess is that the correct line could be something like

     Name=gpu Type=a100 
MultipleFile=/dev/nvidia0,/dev/dri/card1,/dev/dri/renderD128

(our machines have an integrated GPU, too, which creates /dev/dri/card0, 
but not renderD device; that's why I allocate card1 to the 0th nvidia gpu)

I'll try to make a test setup using this and report how it works. Most 
importantly, it would be essential to know whether the card* and 
renderD* device names are also assigned in PCI order (hope so!). And 
whether cgroups are handling these devices correctly. There would also 
be a problem how to report which card* and renderD* devices the user can 
use in a job, but if they can be devised from SLURM_STEP_GPUS, it 
wouldn't be difficult to provide a userspace script that generates the 
list of usable devices.

> Is your goal to enable VirtualGL for jobs? If it is, I tried a solution
> with packing it, its dependencies, a minimal X11 server and turbovnc
> into a singularity image which can be used in a job.
> This worked as a proof of concept for glxgears, but not for the software
> users wanted to run.
Yes, virtualgl+xvfb or virtualgl+turbovnc is exactly the use-case on my 
mind. We had this working on a headless non-slurm server without a lot 
of problems, running a robotics simulator with rendering sensors, and 
sometimes even with GUI.
> Eventually this might work with Vulkan instead of OpenGL. Software in
> question would have to be updated, too, GPU drivers would have to
> support the needed Vulkan features as well.
No idea which devices Vulkan uses. Are they also the DRM devices?

Martin


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4482 bytes
Desc: Elektronicky podpis S/MIME
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220107/710c761d/attachment-0001.bin>


More information about the slurm-users mailing list