[slurm-users] disable-bindings disables counting of gres resources
Peter Steinbach
steinbac at mpi-cbg.de
Mon Mar 25 09:30:34 UTC 2019
Dear all,
Using these config files,
https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/gres.conf
https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/slurm.conf
I observed a weird behavior of the '--gres-flags=disable-binding'
option. With the above .conf files, I created a local slurm cluster with
3 computes (2 GPUs and 4 cores each).
# sinfo -N -l
Mon Mar 25 09:20:59 2019
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK
WEIGHT AVAIL_FE REASON
g1 1 gpu* idle 4 1:4:1 4000 0
1 (null) none
g2 1 gpu* idle 4 1:4:1 4000 0
1 (null) none
g3 1 gpu* idle 4 1:4:1 4000 0
1 (null) none
I first submitted 3 jobs that consume all available GPUs:
# sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
--mem=500
Submitted batch job 2
# sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
--mem=500
Submitted batch job 3
# sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
--mem=500
Submitted batch job 4
# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
5 gpu wrap root R 0:04 1 g1
6 gpu wrap root R 0:01 1 g2
7 gpu wrap root R 0:01 1 g3
Funny enough, if I send a job with only one gpu and add
--gres-flags=disable-binding it actually starts running.
# sbatch --gres=gpu:1 --wrap="env && sleep 30" -o use_1gpu_%A.out
--mem=500 --gres-flags=disable-binding
Submitted batch job 9
[root at ernie /]# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
5 gpu wrap root R 1:44 1 g1
6 gpu wrap root R 1:41 1 g2
7 gpu wrap root R 1:41 1 g3
9 gpu wrap root R 0:02 1 g1
I am not sure what to think of this. I consider this behavior not ideal
as our users reported that their jobs die due to insufficient GPU memory
avialble. Which is obvious, as the already present GPU jobs are using
the GPUs (as they should).
I am a bit lost here. slurm is as clever as to NOT SET
CUDA_VISIBLE_DEVICES for the job that has
'--gres-flags=disable-binding', but that doesn't help our users.
Personally, I believe this is a bug, but I would love to get feedback
from other slurm users/developers.
Thanks in advance -
P
# scontrol show Nodes g1
NodeName=g1 CoresPerSocket=4
CPUAlloc=1 CPUTot=4 CPULoad=N/A
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:titanxp:2
NodeAddr=127.0.0.1 NodeHostName=localhost Port=0
RealMemory=4000 AllocMem=500 FreeMem=N/A Sockets=1 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=gpu
BootTime=2019-03-18T10:14:18 SlurmdStartTime=2019-03-25T09:20:57
CfgTRES=cpu=4,mem=4000M,billing=4
AllocTRES=cpu=1,mem=500M
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
JobId=5 JobName=wrap
UserId=root(0) GroupId=root(0) MCS_label=N/A
Priority=4294901756 Nice=0 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:06:30 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-03-25T09:23:13 EligibleTime=2019-03-25T09:23:13
AccrueTime=Unknown
StartTime=2019-03-25T09:23:13 EndTime=2019-03-30T09:23:13 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-03-25T09:23:13
Partition=gpu AllocNode:Sid=ernie:1
ReqNodeList=(null) ExcNodeList=(null)
NodeList=g1
BatchHost=localhost
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=500M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
Nodes=g1 CPU_IDs=0 Mem=500 GRES_IDX=gpu(IDX:0-1)
MinCPUsNode=1 MinMemoryNode=500M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/
StdErr=//block_2gpus_5.out
StdIn=/dev/null
StdOut=//block_2gpus_5.out
Power=
TresPerNode=gpu:2
JobId=10 JobName=wrap
UserId=root(0) GroupId=root(0) MCS_label=N/A
Priority=4294901751 Nice=0 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:00:07 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-03-25T09:29:12 EligibleTime=2019-03-25T09:29:12
AccrueTime=Unknown
StartTime=2019-03-25T09:29:12 EndTime=2019-03-30T09:29:12 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-03-25T09:29:12
Partition=gpu AllocNode:Sid=ernie:1
ReqNodeList=(null) ExcNodeList=(null)
NodeList=g1
BatchHost=localhost
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=500M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
Nodes=g1 CPU_IDs=1 Mem=500 GRES_IDX=gpu(IDX:)
MinCPUsNode=1 MinMemoryNode=500M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/
StdErr=//use_1gpu_10.out
StdIn=/dev/null
StdOut=//use_1gpu_10.out
Power=
GresEnforceBind=No
TresPerNode=gpu:1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5253 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190325/e65d2742/attachment.bin>
More information about the slurm-users
mailing list