[slurm-users] How to share GPU resources? (MPS or another way?)
kota.tsuyuzaki.pc at hco.ntt.co.jp
Wed Oct 9 07:11:26 UTC 2019
> On 10/8/19 1:47 AM, Kota Tsuyuzaki wrote:
> > GPU is running as well as gres gpu:1. And more, the NVIDIA docs looks
> > to describe what I hit
> > (https://docs.nvidia.com/deploy/mps/index.html#topic_4_3). That seems like the mps-server will be created to each
> user and the server will be running exclusively so I have my doubts the direction...
> From the description provided by Nvidia:
> "The MPS control daemon is responsible for the startup and shutdown of MPS servers. The control daemon allows at
> most one MPS server to be active at a time. When an MPS client connects to the control daemon, the daemon launches an
> MPS server if there is no server active. The MPS server is launched with the same user id as that of the MPS client.
> If there is an MPS server already active and the user id of the server and client match, then the control daemon allows the
> client to proceed to connect to the server. If there is an MPS server already active, but the server and client were launched
> with different user id's, the control daemon requests the existing server to shutdown once all its clients have
> disconnected. Once the existing server has shutdown, the control daemon launches a new server with the same user id
> as that of the new user's client process. This is shown in the figure above where user Bob starts client C' before a server
> is avialable. Only once user Alice's clients exit is a server created for user Bob and client C'."
> It looks like GPU resources can only be shared by processes run by the same user? Maybe this is different for the Volta
> architecture, though, as GPU memory is no longer shared between simultaneous processes.
It fits my current understanding. Assuming that would be true, the slurm mps docs might mislead the usage. And if there is another
way (nvidia grid? Something supported by slurm?), I'd like to looking for...
More information about the slurm-users