[slurm-users] Use gres to handle permissions of /dev/dri/card* and /dev/dri/renderD*?
Martin Pecka
peci1 at seznam.cz
Fri Jan 7 10:30:08 UTC 2022
Maybe I have good news, Stephan (and others). I discovered SLURM 20.11
added a MultipleFiles option to gres.conf, which replaces File=. There
are no docs about it yet, but I found a (possibly) working snippet
making use of this option here:
https://bugs.schedmd.com/show_bug.cgi?id=11091#c13 .
So my guess is that the correct line could be something like
Name=gpu Type=a100
MultipleFile=/dev/nvidia0,/dev/dri/card1,/dev/dri/renderD128
(our machines have an integrated GPU, too, which creates /dev/dri/card0,
but not renderD device; that's why I allocate card1 to the 0th nvidia gpu)
I'll try to make a test setup using this and report how it works. Most
importantly, it would be essential to know whether the card* and
renderD* device names are also assigned in PCI order (hope so!). And
whether cgroups are handling these devices correctly. There would also
be a problem how to report which card* and renderD* devices the user can
use in a job, but if they can be devised from SLURM_STEP_GPUS, it
wouldn't be difficult to provide a userspace script that generates the
list of usable devices.
> Is your goal to enable VirtualGL for jobs? If it is, I tried a solution
> with packing it, its dependencies, a minimal X11 server and turbovnc
> into a singularity image which can be used in a job.
> This worked as a proof of concept for glxgears, but not for the software
> users wanted to run.
Yes, virtualgl+xvfb or virtualgl+turbovnc is exactly the use-case on my
mind. We had this working on a headless non-slurm server without a lot
of problems, running a robotics simulator with rendering sensors, and
sometimes even with GUI.
> Eventually this might work with Vulkan instead of OpenGL. Software in
> question would have to be updated, too, GPU drivers would have to
> support the needed Vulkan features as well.
No idea which devices Vulkan uses. Are they also the DRM devices?
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4482 bytes
Desc: Elektronicky podpis S/MIME
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220107/710c761d/attachment-0001.bin>
More information about the slurm-users
mailing list