[slurm-users] gres with docker problem

Tue Jan 1 18:25:41 MST 2019

Hi
I'm using slurm with GRES(4 GPU).

I wanna allocate jobs uniformly through GRES(especially GPU). But, It does not work when I use Docker.

For example,

If i run this command for 4 times with different tty, I could get what i want to get.

As you can see, All Bus-Ids are different.

#1

$ srun --gres=gpu:1 --gres-flags=enforce-binding --cpus-per-task=8 --mem=20G --pty bash
$ nvidia-smi
Wed Jan  2 01:02:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:14:00.0 Off |                    0 |
| N/A   30C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

#2

$ srun --gres=gpu:1 --gres-flags=enforce-binding --cpus-per-task=8 --mem=20G --pty bash
$ nvidia-smi 
Wed Jan  2 01:02:39 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:15:00.0 Off |                    0 |
| N/A   32C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

#3

$ srun --gres=gpu:1 --gres-flags=enforce-binding --cpus-per-task=8 --mem=20G --pty bash
$ nvidia-smi
Wed Jan  2 00:36:22 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:39:00.0 Off |                    0 |
| N/A   30C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

#4

$ srun --gres=gpu:1 --gres-flags=enforce-binding --cpus-per-task=8 --mem=20G --pty bash
$ nvidia-smi
Wed Jan  2 01:03:50 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:3A:00.0 Off |                    0 |
| N/A   29C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Also scontrol show Command is OK.

All of GRES_IDXs are different.

$ scontrol show job=472 --details
JobId=472 JobName=bash
   UserId=root(0) GroupId=root(0) MCS_label=N/A
   Priority=4294901759 Nice=0 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:29:12 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2019-01-02T00:35:37 EligibleTime=2019-01-02T00:35:37
   StartTime=2019-01-02T00:35:37 EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=all AllocNode:Sid=...:30423
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=...
   BatchHost=...
   NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
   TRES=cpu=8,mem=20G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
     Nodes=... CPU_IDs=0-7 Mem=20480 GRES_IDX=gpu(IDX:0)
   MinCPUsNode=8 MinMemoryNode=20G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=gpu:1 Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/etc/slurm
   Power=
   GresEnforceBind=Yes

$ scontrol show job=473 --details
JobId=473 JobName=bash
   UserId=root(0) GroupId=root(0) MCS_label=N/A
   Priority=4294901758 Nice=0 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:30:10 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2019-01-02T00:36:14 EligibleTime=2019-01-02T00:36:14
   StartTime=2019-01-02T00:36:14 EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=all AllocNode:Sid=...:31738
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=...
   BatchHost=...
   NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
   TRES=cpu=8,mem=20G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
     Nodes=... CPU_IDs=8-15 Mem=20480 GRES_IDX=gpu(IDX:1)
   MinCPUsNode=8 MinMemoryNode=20G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=gpu:1 Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/root
   Power=
   GresEnforceBind=Yes

...

But the problem is here.

when I use Docker, Slurm GRES is not working.

$ srun --gres=gpu:1 --gres-flags=enforce-binding --cpus-per-task=8 --mem=20G --pty bash
$ nvidia-smi
Wed Jan  2 01:02:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:14:00.0 Off |                    0 |
| N/A   30C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Wed Jan  2 01:10:35 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   |00000000:14:00.0 Off |                    0 |
| N/A   30C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  On   | 00000000:15:00.0 Off |                    0 |
| N/A   32C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  On   | 00000000:39:00.0 Off |                    0 |
| N/A   30C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  On   | 00000000:3A:00.0 Off |                    0 |
| N/A   28C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Here is my configs(slurm.conf, gres.conf)

ControlMachine=...
ControlAddr=...

MailProg=/bin/mail
MpiDefault=none
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
SlurmdUser=root
StateSaveLocation=/var/spool
SwitchType=switch/none
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
AuthType=auth/munge

FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory

GresTypes=gpu

AccountingStorageType=accounting_storage/filetxt
JobCompType=jobcomp/filetxt
JobAcctGatherType=jobacct_gather/cgroup
ClusterName=...

SlurmctldDebug=7
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=7
SlurmdLogFile=/var/log/slurmd.log

# COMPUTE NODES
NodeName=... NodeHostName=... Gres=gpu:4 CPUs=32 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=128432 State=UNKNOWN
PartitionName=all Nodes=... Default=YES MaxTime=INFINITE State=UP

Name=gpu Type=tesla File=/dev/nvidia0 CPUs=0-7
Name=gpu Type=tesla File=/dev/nvidia1 CPUs=8-15
Name=gpu Type=tesla File=/dev/nvidia2 CPUs=16-23
Name=gpu Type=tesla File=/dev/nvidia3 CPUs=24-31

what's the problem?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190102/1039fd25/attachment-0001.html>