[slurm-users] disable-bindings disables counting of gres resources
Peter Steinbach
steinbac at mpi-cbg.de
Fri Mar 29 12:27:52 UTC 2019
Just to follow up, I filed a medium bug report with schedmd on this:
https://bugs.schedmd.com/show_bug.cgi?id=6763
Best,
Peter
On 3/25/19 10:30 AM, Peter Steinbach wrote:
> Dear all,
>
> Using these config files,
>
> https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/gres.conf
>
>
> https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/slurm.conf
>
>
> I observed a weird behavior of the '--gres-flags=disable-binding'
> option. With the above .conf files, I created a local slurm cluster with
> 3 computes (2 GPUs and 4 cores each).
>
> # sinfo -N -l
> Mon Mar 25 09:20:59 2019
> NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK
> WEIGHT AVAIL_FE REASON
> g1 1 gpu* idle 4 1:4:1 4000 0
> 1 (null) none
> g2 1 gpu* idle 4 1:4:1 4000 0
> 1 (null) none
> g3 1 gpu* idle 4 1:4:1 4000 0
> 1 (null) none
>
> I first submitted 3 jobs that consume all available GPUs:
>
> # sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
> --mem=500
> Submitted batch job 2
> # sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
> --mem=500
> Submitted batch job 3
> # sbatch --gres=gpu:2 --wrap="env && sleep 600" -o block_2gpus_%A.out
> --mem=500
> Submitted batch job 4
> # squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 5 gpu wrap root R 0:04 1 g1
> 6 gpu wrap root R 0:01 1 g2
> 7 gpu wrap root R 0:01 1 g3
>
> Funny enough, if I send a job with only one gpu and add
> --gres-flags=disable-binding it actually starts running.
>
> # sbatch --gres=gpu:1 --wrap="env && sleep 30" -o use_1gpu_%A.out
> --mem=500 --gres-flags=disable-binding
> Submitted batch job 9
> [root at ernie /]# squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 5 gpu wrap root R 1:44 1 g1
> 6 gpu wrap root R 1:41 1 g2
> 7 gpu wrap root R 1:41 1 g3
> 9 gpu wrap root R 0:02 1 g1
>
> I am not sure what to think of this. I consider this behavior not ideal
> as our users reported that their jobs die due to insufficient GPU memory
> avialble. Which is obvious, as the already present GPU jobs are using
> the GPUs (as they should).
>
> I am a bit lost here. slurm is as clever as to NOT SET
> CUDA_VISIBLE_DEVICES for the job that has
> '--gres-flags=disable-binding', but that doesn't help our users.
>
> Personally, I believe this is a bug, but I would love to get feedback
> from other slurm users/developers.
>
> Thanks in advance -
> P
>
> # scontrol show Nodes g1
> NodeName=g1 CoresPerSocket=4
> CPUAlloc=1 CPUTot=4 CPULoad=N/A
> AvailableFeatures=(null)
> ActiveFeatures=(null)
> Gres=gpu:titanxp:2
> NodeAddr=127.0.0.1 NodeHostName=localhost Port=0
> RealMemory=4000 AllocMem=500 FreeMem=N/A Sockets=1 Boards=1
> State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
> Partitions=gpu
> BootTime=2019-03-18T10:14:18 SlurmdStartTime=2019-03-25T09:20:57
> CfgTRES=cpu=4,mem=4000M,billing=4
> AllocTRES=cpu=1,mem=500M
> CapWatts=n/a
> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> JobId=5 JobName=wrap
> UserId=root(0) GroupId=root(0) MCS_label=N/A
> Priority=4294901756 Nice=0 Account=(null) QOS=normal
> JobState=RUNNING Reason=None Dependency=(null)
> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
> DerivedExitCode=0:0
> RunTime=00:06:30 TimeLimit=5-00:00:00 TimeMin=N/A
> SubmitTime=2019-03-25T09:23:13 EligibleTime=2019-03-25T09:23:13
> AccrueTime=Unknown
> StartTime=2019-03-25T09:23:13 EndTime=2019-03-30T09:23:13 Deadline=N/A
> PreemptTime=None SuspendTime=None SecsPreSuspend=0
> LastSchedEval=2019-03-25T09:23:13
> Partition=gpu AllocNode:Sid=ernie:1
> ReqNodeList=(null) ExcNodeList=(null)
> NodeList=g1
> BatchHost=localhost
> NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> TRES=cpu=1,mem=500M,node=1,billing=1
> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> Nodes=g1 CPU_IDs=0 Mem=500 GRES_IDX=gpu(IDX:0-1)
> MinCPUsNode=1 MinMemoryNode=500M MinTmpDiskNode=0
> Features=(null) DelayBoot=00:00:00
> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> Command=(null)
> WorkDir=/
> StdErr=//block_2gpus_5.out
> StdIn=/dev/null
> StdOut=//block_2gpus_5.out
> Power=
> TresPerNode=gpu:2
>
> JobId=10 JobName=wrap
> UserId=root(0) GroupId=root(0) MCS_label=N/A
> Priority=4294901751 Nice=0 Account=(null) QOS=normal
> JobState=RUNNING Reason=None Dependency=(null)
> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
> DerivedExitCode=0:0
> RunTime=00:00:07 TimeLimit=5-00:00:00 TimeMin=N/A
> SubmitTime=2019-03-25T09:29:12 EligibleTime=2019-03-25T09:29:12
> AccrueTime=Unknown
> StartTime=2019-03-25T09:29:12 EndTime=2019-03-30T09:29:12 Deadline=N/A
> PreemptTime=None SuspendTime=None SecsPreSuspend=0
> LastSchedEval=2019-03-25T09:29:12
> Partition=gpu AllocNode:Sid=ernie:1
> ReqNodeList=(null) ExcNodeList=(null)
> NodeList=g1
> BatchHost=localhost
> NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> TRES=cpu=1,mem=500M,node=1,billing=1
> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> Nodes=g1 CPU_IDs=1 Mem=500 GRES_IDX=gpu(IDX:)
> MinCPUsNode=1 MinMemoryNode=500M MinTmpDiskNode=0
> Features=(null) DelayBoot=00:00:00
> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> Command=(null)
> WorkDir=/
> StdErr=//use_1gpu_10.out
> StdIn=/dev/null
> StdOut=//use_1gpu_10.out
> Power=
> GresEnforceBind=No
> TresPerNode=gpu:1
>
--
Peter Steinbach, Dr. rer. nat.
Scientific Software Engineer, Scientific Computing Facility
Max Planck Institute of Molecular Cell Biology and Genetics
Pfotenhauerstr. 108
01307 Dresden
Germany
phone +49 351 210 2882
fax +49 351 210 1689
twitter psteinb_
www.mpi-cbg.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5253 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190329/b1a9a362/attachment.bin>
More information about the slurm-users
mailing list