[slurm-users] How to share GPU resources? (MPS or another way?)
Goetz, Patrick G
pgoetz at math.utexas.edu
Tue Oct 8 16:30:13 UTC 2019
On 10/8/19 1:47 AM, Kota Tsuyuzaki wrote:
> GPU is running as well as gres gpu:1. And more, the NVIDIA docs looks to describe what I hit
> (https://docs.nvidia.com/deploy/mps/index.html#topic_4_3). That seems like the mps-server will be created to each user and the
> server will be running exclusively so I have my doubts the direction...
>
From the description provided by Nvidia:
"The MPS control daemon is responsible for the startup and shutdown of
MPS servers. The control daemon allows at most one MPS server to be
active at a time. When an MPS client connects to the control daemon, the
daemon launches an MPS server if there is no server active. The MPS
server is launched with the same user id as that of the MPS client.
If there is an MPS server already active and the user id of the server
and client match, then the control daemon allows the client to proceed
to connect to the server. If there is an MPS server already active, but
the server and client were launched with different user id’s, the
control daemon requests the existing server to shutdown once all its
clients have disconnected. Once the existing server has shutdown, the
control daemon launches a new server with the same user id as that of
the new user's client process. This is shown in the figure above where
user Bob starts client C' before a server is avialable. Only once user
Alice's clients exit is a server created for user Bob and client C'."
It looks like GPU resources can only be shared by processes run by the
same user? Maybe this is different for the Volta architecture, though,
as GPU memory is no longer shared between simultaneous processes.
More information about the slurm-users
mailing list