[slurm-users] Sharing a GPU
Eric F. Alemany
ealemany at stanford.edu
Mon Apr 4 01:23:37 UTC 2022
Another solution would be the vNVIDIA GPU
(Virtual GPU manager software).
You can share GPU among VM’s
._____________________________________________________________________________________________________
Eric F. Alemany
System Administrator for Research
EXO - Extended Operations
Stanford Medicine - Technology & Digital Services
On Apr 3, 2022, at 17:04, Renfro, Michael <Renfro at tntech.edu> wrote:
Someone else may see another option, but NVIDIA MIG seems like the straightforward option. That would require both a Slurm upgrade and the purchase of MIG-capable cards.
https://slurm.schedmd.com/gres.html#MIG_Management
Would be able to host 7 users per A100 card, IIRC.
On Apr 3, 2022, at 4:20 PM, Kamil Wilczek <kmwil at mimuw.edu.pl> wrote:
Hello!
I am an administrator of a GPU cluster (Slurm version 19.05.5).
Could someone help me a little bit and explain if a single
GPU can be shared between multiple users? My experience and
documentation tells me that it is not possible. But even after
some time Slurm is still a beast to me and I find myself
struggling :)
* I setup the cluster to assign GPUs on multi-GPU servers
to different users using GRES. This works fine and several
users can work on a multi-GPU machine (--gres=gpu:N/--gpu:N).
* But sometimes I have requests to allow a group of students
to work simultaneously, interactively on a small partition,
where there is more users than GPUs. So I thought that maybe
an MPS is a solutions, but the docs says that MPS is a way
to run multiple jobs of *the same* user on a single GPU.
When another user is requesting a GPU by MPS, the job is enqueued
and waiting for the first users' MPS server to finish.
So, this is not a solution for a multi-user, simultaneous/parallel
environment, right?
Is there a way to share a GPU between multiple users?
The requirement is, say:
* 16 users working interactively, simultaneously
* 4 GPUs partition
Kind Regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220404/5e6de168/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/octet-stream
Size: 243 bytes
Desc: OpenPGP_signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220404/5e6de168/attachment-0001.obj>
More information about the slurm-users
mailing list