[slurm-users] How to share GPU resources? (MPS or another way?)

Tue Oct 8 16:30:13 UTC 2019

On 10/8/19 1:47 AM, Kota Tsuyuzaki wrote:
> GPU is running as well as gres gpu:1. And more, the NVIDIA docs looks to describe what I hit
> (https://docs.nvidia.com/deploy/mps/index.html#topic_4_3). That seems like the mps-server will be created to each user and the
> server will be running exclusively so I have my doubts the direction...
> 

 From the description provided by Nvidia:

"The MPS control daemon is responsible for the startup and shutdown of  
MPS servers. The control daemon allows at most one MPS server to be  
active at a time. When an MPS client connects to the control daemon, the  
daemon launches an MPS server if there is no server active. The MPS  
server is launched with the same user id as that of the MPS client.

If there is an MPS server already active and the user id of the server  
and client match, then the control daemon allows the client to proceed  
to connect to the server. If there is an MPS server already active, but  
the server and client were launched with different user id’s, the  
control daemon requests the existing server to shutdown once all its  
clients have disconnected. Once the existing server has shutdown, the  
control daemon launches a new server with the same user id as that of  
the new user's client process. This is shown in the figure above where  
user Bob starts client C' before a server is avialable. Only once user  
Alice's clients exit is a server created for user Bob and client C'."

It looks like GPU resources can only be shared by processes run by the  
same user?  Maybe this is different for the Volta architecture, though,  
as GPU memory is no longer shared between simultaneous processes.