[slurm-users] Nvidia MPS with more than one GPU per node
EPF (Esben Peter Friis)
EPF at novozymes.com
Thu Aug 12 09:53:24 UTC 2021
Hi all
I'm quite new to Slurm, and have set up an Ubuntu box with 5 A40 GPU's
Allocating one or more GPU's with --gres=gpu:1 (or --gres=gpu:2 ) works great!
But we have a number of tasks that only use e.g. 50% of the resources of one GPU. So in this case,
we would like to be able to submit 10 jobs with --gres=mps:50 that should automatically be allocated
as two to each GPU.
But I run into exatcly the same problem as Geoffrey described last year (see below):
The process works great for the two jobs allocated to the first GPU,
but subsequent jobs are queued instead of starting on the next GPU.
I am running the Nvidia MPS server, and nvidia-smi looks ok:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A40 Off | 00000000:25:00.0 Off | 0 |
| 0% 28C P8 21W / 300W | 29MiB / 45634MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 A40 Off | 00000000:81:00.0 Off | 0 |
| 0% 28C P8 24W / 300W | 30MiB / 45634MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 A40 Off | 00000000:A1:00.0 Off | 0 |
| 0% 26C P8 29W / 300W | 30MiB / 45634MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 A40 Off | 00000000:C1:00.0 Off | 0 |
| 0% 27C P8 31W / 300W | 30MiB / 45634MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 A40 Off | 00000000:E1:00.0 Off | 0 |
| 0% 26C P8 23W / 300W | 30MiB / 45634MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 36939 C nvidia-cuda-mps-server 27MiB |
| 1 N/A N/A 36939 C nvidia-cuda-mps-server 27MiB |
| 2 N/A N/A 36939 C nvidia-cuda-mps-server 27MiB |
| 3 N/A N/A 36939 C nvidia-cuda-mps-server 27MiB |
| 4 N/A N/A 36939 C nvidia-cuda-mps-server 27MiB |
+-----------------------------------------------------------------------------+
I was wondering if anyone managed to get this to work?
Cheers,
Esben
----------- gres.conf ---------------
##################################################################
# Slurm's Generic Resource (GRES) configuration file
# Define GPU devices
##################################################################
#AutoDetect=nvml
Name=gpu Type=A40 File=/dev/nvidia[0-4]
Name=mps Count=500 File=/dev/nvidia[0-4]
------------ slurm.conf ---------------
SlurmctldHost=ai
NodeName=ai Boards=1 SocketsPerBoard=2 CoresPerSocket=48 ThreadsPerCore=2 Gres=gpu:A40:5,mps:500 Feature=ht,gpu,mps
PartitionName=debug Nodes=ai Default=YES MaxTime=INFINITE State=UP AllowGroups=ALL AllowAccounts=ALL
SlurmdUser=root
ClusterName=cluster
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
JobAcctGatherType=jobacct_gather/cgroup
## GRES
GresTypes=gpu,mps
DebugFlags=CPU_Bind,gres
> --------------------------
> Ransom, Geoffrey M. Thu, 09 Jan 2020 10:53:10 -0800
BLUF:
Is the Nvidia MPS service required for the MPS gres to function in slurm
with multiple GPUs in a single machine? (jobs using MPS don't need to span
GPUs, just use a part of a GPU in a machine with multiple GPUs)
Is there more detailed documentation available on how MPS should be set up
and how it functions?
I'm playing with mps on a test machine and the documentation at
https://slurm.schedmd.com/gres.html seems a bit vague. It implies it can be
used across multiple GPUs, but then states that only one GPU per node may be
configured for use with MPS.
When I test mps in slurm without the NVIDIA MPS service (I am just starting to
read up on the NVIDIA MPS service now) it does seem to only use one GPU.
In gres.conf
NodeName=testmachine1 Name=gpu File=/dev/nvidia[0-1]
NodeName=testmachine1 Name=mps count=200 File=/dev/nvidia[0-1]
In slurm.conf
NodeName=testmachine1 Gres=gpu:2,mps:200 Sockets=1 CoresPerSocket=6
An array job posted with "-gres=mps:50" will put two job steps on the first
GPU, but doesn't use the second GPU for mps jobs.
Is the Nvidia MPS service required for the MPS gres to function in slurm?
Is there more detailed documentation available on how MPS should be set up and
how it functions?
We have a mixed set of work (shared GPU using 1 CPU core and a small percentage
of one GPU versus dedicated GPU jobs using a whole number of GPUs and CPUs) on
machines with 4 GPUs and it would be nice to have them co-exist instead of
splitting the machines into two separate partitions for the two styles of jobs.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210812/aa1e7ecc/attachment-0001.htm>
More information about the slurm-users
mailing list