[slurm-users] Nvidia MPS with more than one GPU per node

EPF (Esben Peter Friis) EPF at novozymes.com
Thu Aug 12 09:53:24 UTC 2021


Hi all


I'm quite new to Slurm, and have set up an Ubuntu box with 5 A40 GPU's

Allocating one or more GPU's with --gres=gpu:1  (or --gres=gpu:2 ) works great!


But we have a number of tasks that only use e.g. 50% of the resources of one GPU. So in this case,

we would like to be able to submit 10 jobs with --gres=mps:50 that should automatically be allocated

as two to each GPU.

But I run into exatcly the same problem as Geoffrey described last year (see below):

The process works great for the two jobs allocated to the first GPU,

but subsequent jobs are queued instead of starting on the next GPU.


I am running the Nvidia MPS server, and nvidia-smi looks ok:


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A40                 Off  | 00000000:25:00.0 Off |                    0 |
|  0%   28C    P8    21W / 300W |     29MiB / 45634MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  A40                 Off  | 00000000:81:00.0 Off |                    0 |
|  0%   28C    P8    24W / 300W |     30MiB / 45634MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  A40                 Off  | 00000000:A1:00.0 Off |                    0 |
|  0%   26C    P8    29W / 300W |     30MiB / 45634MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  A40                 Off  | 00000000:C1:00.0 Off |                    0 |
|  0%   27C    P8    31W / 300W |     30MiB / 45634MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  A40                 Off  | 00000000:E1:00.0 Off |                    0 |
|  0%   26C    P8    23W / 300W |     30MiB / 45634MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     36939      C   nvidia-cuda-mps-server             27MiB |
|    1   N/A  N/A     36939      C   nvidia-cuda-mps-server             27MiB |
|    2   N/A  N/A     36939      C   nvidia-cuda-mps-server             27MiB |
|    3   N/A  N/A     36939      C   nvidia-cuda-mps-server             27MiB |
|    4   N/A  N/A     36939      C   nvidia-cuda-mps-server             27MiB |
+-----------------------------------------------------------------------------+


I was wondering if anyone managed to get this to work?



Cheers,


Esben



----------- gres.conf ---------------


##################################################################

# Slurm's Generic Resource (GRES) configuration file
# Define GPU devices
##################################################################
#AutoDetect=nvml
Name=gpu Type=A40 File=/dev/nvidia[0-4]
Name=mps Count=500 File=/dev/nvidia[0-4]



------------ slurm.conf ---------------


SlurmctldHost=ai
NodeName=ai Boards=1 SocketsPerBoard=2 CoresPerSocket=48 ThreadsPerCore=2 Gres=gpu:A40:5,mps:500 Feature=ht,gpu,mps

PartitionName=debug Nodes=ai Default=YES MaxTime=INFINITE State=UP AllowGroups=ALL AllowAccounts=ALL

SlurmdUser=root
ClusterName=cluster

SelectType=select/cons_tres
SelectTypeParameters=CR_Core
JobAcctGatherType=jobacct_gather/cgroup

## GRES
GresTypes=gpu,mps
DebugFlags=CPU_Bind,gres





> --------------------------

> Ransom, Geoffrey M. Thu, 09 Jan 2020 10:53:10 -0800


BLUF:
     Is the Nvidia MPS service required for the MPS gres to function in slurm
with multiple GPUs in a single machine? (jobs using MPS don't need to span
GPUs, just use a part of a GPU in a machine with multiple GPUs)
     Is there more detailed documentation available on how MPS should be set up
and how it functions?

I'm playing with mps on a test machine and the documentation at
https://slurm.schedmd.com/gres.html seems a bit vague. It implies it can be
used across multiple GPUs, but then states that only one GPU per node may be
configured for use with MPS.

When I test mps in slurm without the NVIDIA MPS service  (I am just starting to
read up on the NVIDIA MPS service now) it does seem to only use one GPU.

In gres.conf
     NodeName=testmachine1 Name=gpu File=/dev/nvidia[0-1]
     NodeName=testmachine1 Name=mps count=200 File=/dev/nvidia[0-1]

In slurm.conf
     NodeName=testmachine1 Gres=gpu:2,mps:200 Sockets=1 CoresPerSocket=6

An array job posted with "-gres=mps:50" will put two job steps on the first
GPU, but doesn't use the second GPU for mps jobs.

Is the Nvidia MPS service required for the MPS gres to function in slurm?
Is there more detailed documentation available on how MPS should be set up and
how it functions?

We have a mixed set of work (shared GPU using 1 CPU core and a small percentage
of one GPU versus dedicated GPU jobs using a whole number of GPUs and CPUs) on
machines with 4 GPUs and it would be nice to have them co-exist instead of
splitting the machines into two separate partitions for the two styles of jobs.

Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210812/aa1e7ecc/attachment-0001.htm>


More information about the slurm-users mailing list