[slurm-users] Sharing a GPU
Bas van der Vlies
bas.vandervlies at surf.nl
Mon Apr 4 07:20:08 UTC 2022
We have the exact same request for our GPUS that are not A100 and we
have developed a lua plugin for our needs (The new slurm version will
also allow the 22.XX). Bu tfor earlier version:
* https://github.com/basvandervlies/surf_slurm_mps
On 03/04/2022 23:19, Kamil Wilczek wrote:
> Hello!
>
> I am an administrator of a GPU cluster (Slurm version 19.05.5).
>
> Could someone help me a little bit and explain if a single
> GPU can be shared between multiple users? My experience and
> documentation tells me that it is not possible. But even after
> some time Slurm is still a beast to me and I find myself
> struggling :)
>
> * I setup the cluster to assign GPUs on multi-GPU servers
> to different users using GRES. This works fine and several
> users can work on a multi-GPU machine (--gres=gpu:N/--gpu:N).
>
> * But sometimes I have requests to allow a group of students
> to work simultaneously, interactively on a small partition,
> where there is more users than GPUs. So I thought that maybe
> an MPS is a solutions, but the docs says that MPS is a way
> to run multiple jobs of *the same* user on a single GPU.
> When another user is requesting a GPU by MPS, the job is enqueued
> and waiting for the first users' MPS server to finish.
> So, this is not a solution for a multi-user, simultaneous/parallel
> environment, right?
>
> Is there a way to share a GPU between multiple users?
> The requirement is, say:
>
> * 16 users working interactively, simultaneously
> * 4 GPUs partition
>
> Kind Regards
--
Bas van der Vlies
| HPCV Supercomputing | Internal Services | SURF |
https://userinfo.surfsara.nl |
| Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
| bas.vandervlies at surf.nl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2331 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220404/4917309d/attachment.bin>
More information about the slurm-users
mailing list