[slurm-users] disable-bindings disables counting of gres resources
Quirin Lohr
quirin.lohr at in.tum.de
Fri Apr 5 13:31:56 UTC 2019
Same problem here: a Job submitted with gres-flags=disable-bindings is
assigned a node, but then the job step fails because all GPUs on that
node are already in use. Log messages:
[2019-04-05T15:29:05.216] error: gres/gpu: job 92453 node node5
overallocated resources by 1, (9 > 8)
[2019-04-05T15:29:05.216] Gres topology sub-optimal for job 92453
[2019-04-05T15:29:05.217] sched: _slurm_rpc_allocate_resources
JobId=92453 NodeList=node5 usec=497
Am 25.03.19 um 10:30 schrieb Peter Steinbach:
> Dear all,
>
> Using these config files,
>
> https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/gres.conf
>
>
> https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/slurm.conf
>
>
> I observed a weird behavior of the '--gres-flags=disable-binding'
> option. With the above .conf files, I created a local slurm cluster with
> 3 computes (2 GPUs and 4 cores each).
>
> # sinfo -N -l
> Mon Mar 25 09:20:59 2019
> NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK
> WEIGHT AVAIL_FE REASON
> g1 1 gpu* idle 4 1:4:1 4000 0
> 1 (null) none
> g2 1 gpu* idle 4 1:4:1 4000 0
> 1 (null) none
> g3 1 gpu* idle 4 1:4:1 4000 0
> 1 (null) none
>
> I first submitted 3 jobs that consume all available GPUs:
>
> # sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
> --mem=500
> Submitted batch job 2
> # sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
> --mem=500
> Submitted batch job 3
> # sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
> --mem=500
> Submitted batch job 4
> # squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 5 gpu wrap root R 0:04 1 g1
> 6 gpu wrap root R 0:01 1 g2
> 7 gpu wrap root R 0:01 1 g3
>
> Funny enough, if I send a job with only one gpu and add
> --gres-flags=disable-binding it actually starts running.
>
> # sbatch --gres=gpu:1 --wrap="env && sleep 30" -o use_1gpu_%A.out
> --mem=500 --gres-flags=disable-binding
> Submitted batch job 9
> [root at ernie /]# squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 5 gpu wrap root R 1:44 1 g1
> 6 gpu wrap root R 1:41 1 g2
> 7 gpu wrap root R 1:41 1 g3
> 9 gpu wrap root R 0:02 1 g1
>
> I am not sure what to think of this. I consider this behavior not ideal
> as our users reported that their jobs die due to insufficient GPU memory
> avialble. Which is obvious, as the already present GPU jobs are using
> the GPUs (as they should).
>
> I am a bit lost here. slurm is as clever as to NOT SET
> CUDA_VISIBLE_DEVICES for the job that has
> '--gres-flags=disable-binding', but that doesn't help our users.
>
> Personally, I believe this is a bug, but I would love to get feedback
> from other slurm users/developers.
>
> Thanks in advance -
> P
>
> # scontrol show Nodes g1
> NodeName=g1 CoresPerSocket=4
> CPUAlloc=1 CPUTot=4 CPULoad=N/A
> AvailableFeatures=(null)
> ActiveFeatures=(null)
> Gres=gpu:titanxp:2
> NodeAddr=127.0.0.1 NodeHostName=localhost Port=0
> RealMemory=4000 AllocMem=500 FreeMem=N/A Sockets=1 Boards=1
> State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
> Partitions=gpu
> BootTime=2019-03-18T10:14:18 SlurmdStartTime=2019-03-25T09:20:57
> CfgTRES=cpu=4,mem=4000M,billing=4
> AllocTRES=cpu=1,mem=500M
> CapWatts=n/a
> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> JobId=5 JobName=wrap
> UserId=root(0) GroupId=root(0) MCS_label=N/A
> Priority=4294901756 Nice=0 Account=(null) QOS=normal
> JobState=RUNNING Reason=None Dependency=(null)
> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
> DerivedExitCode=0:0
> RunTime=00:06:30 TimeLimit=5-00:00:00 TimeMin=N/A
> SubmitTime=2019-03-25T09:23:13 EligibleTime=2019-03-25T09:23:13
> AccrueTime=Unknown
> StartTime=2019-03-25T09:23:13 EndTime=2019-03-30T09:23:13 Deadline=N/A
> PreemptTime=None SuspendTime=None SecsPreSuspend=0
> LastSchedEval=2019-03-25T09:23:13
> Partition=gpu AllocNode:Sid=ernie:1
> ReqNodeList=(null) ExcNodeList=(null)
> NodeList=g1
> BatchHost=localhost
> NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> TRES=cpu=1,mem=500M,node=1,billing=1
> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> Nodes=g1 CPU_IDs=0 Mem=500 GRES_IDX=gpu(IDX:0-1)
> MinCPUsNode=1 MinMemoryNode=500M MinTmpDiskNode=0
> Features=(null) DelayBoot=00:00:00
> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> Command=(null)
> WorkDir=/
> StdErr=//block_2gpus_5.out
> StdIn=/dev/null
> StdOut=//block_2gpus_5.out
> Power=
> TresPerNode=gpu:2
>
> JobId=10 JobName=wrap
> UserId=root(0) GroupId=root(0) MCS_label=N/A
> Priority=4294901751 Nice=0 Account=(null) QOS=normal
> JobState=RUNNING Reason=None Dependency=(null)
> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
> DerivedExitCode=0:0
> RunTime=00:00:07 TimeLimit=5-00:00:00 TimeMin=N/A
> SubmitTime=2019-03-25T09:29:12 EligibleTime=2019-03-25T09:29:12
> AccrueTime=Unknown
> StartTime=2019-03-25T09:29:12 EndTime=2019-03-30T09:29:12 Deadline=N/A
> PreemptTime=None SuspendTime=None SecsPreSuspend=0
> LastSchedEval=2019-03-25T09:29:12
> Partition=gpu AllocNode:Sid=ernie:1
> ReqNodeList=(null) ExcNodeList=(null)
> NodeList=g1
> BatchHost=localhost
> NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> TRES=cpu=1,mem=500M,node=1,billing=1
> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> Nodes=g1 CPU_IDs=1 Mem=500 GRES_IDX=gpu(IDX:)
> MinCPUsNode=1 MinMemoryNode=500M MinTmpDiskNode=0
> Features=(null) DelayBoot=00:00:00
> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> Command=(null)
> WorkDir=/
> StdErr=//use_1gpu_10.out
> StdIn=/dev/null
> StdOut=//use_1gpu_10.out
> Power=
> GresEnforceBind=No
> TresPerNode=gpu:1
>
--
Quirin Lohr
Systemadministration
Technische Universität München
Fakultät für Informatik
Lehrstuhl für Bildverarbeitung und Mustererkennung
Boltzmannstrasse 3
85748 Garching
Tel. +49 89 289 17769
Fax +49 89 289 17757
quirin.lohr at in.tum.de
www.vision.in.tum.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5565 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190405/782061fe/attachment.bin>
More information about the slurm-users
mailing list