[slurm-users] 回复: How to bind GPUs with CPU cores

William Zhang zhangyuchao1 at hotmail.com
Fri Oct 14 11:04:58 UTC 2022


Hi Ward,
I have a try with --gres-flags=enforce-binding but it doesn't work.
The first job apply 1GPU and 6 CPUs . The CPU ID is 0-5 GPU ID is 0.
The second job apply 1GPU and 6 CPUs.The CPU ID is 6-11.But I hope the CPU ID is 16-21.

[zhangyc at ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh
Submitted batch job 198106
[zhangyc at ln01 numa]$
[zhangyc at ln01 numa]$ scontrol show job 198106 -d
JobId=198106 JobName=run.sh
   UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A
   Priority=4294704732 Nice=0 Account=zhangyc QOS=normal WCKey=*
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2022-10-14T18:38:52 EligibleTime=2022-10-14T18:38:52
   AccrueTime=2022-10-14T18:38:52
   StartTime=2022-10-14T18:38:56 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:38:56
   Partition=gpu_c128 AllocNode:Sid=ln01:26986
   ReqNodeList=g0036 ExcNodeList=(null)
   NodeList=g0036
   BatchHost=g0036
   NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=gpu:1
     Nodes=g0036 CPU_IDs=0-5 Mem=60000 GRES=gpu:1(IDX:0)
   MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=./run.sh
   WorkDir=/data/run01/zhangyc/numa
   StdErr=/data/run01/zhangyc/numa/slurm-198106.out
   StdIn=/dev/null
   StdOut=/data/run01/zhangyc/numa/slurm-198106.out
   Power=
   GresEnforceBind=Yes
   CpusPerTres=gpu:6
   TresPerJob=gpu:1
   NtasksPerTRES:0


[zhangyc at ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh
Submitted batch job 198107
[zhangyc at ln01 numa]$ scontrol show job 198107 -d
JobId=198107 JobName=run.sh
   UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A
   Priority=4294704731 Nice=0 Account=zhangyc QOS=normal WCKey=*
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:04 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2022-10-14T18:39:05 EligibleTime=2022-10-14T18:39:05
   AccrueTime=2022-10-14T18:39:05
   StartTime=2022-10-14T18:39:05 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:39:05
   Partition=gpu_c128 AllocNode:Sid=ln01:26986
   ReqNodeList=g0036 ExcNodeList=(null)
   NodeList=g0036
   BatchHost=g0036
   NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=gpu:1
     Nodes=g0036 CPU_IDs=6-11 Mem=60000 GRES=gpu:1(IDX:1)
   MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=./run.sh
   WorkDir=/data/run01/zhangyc/numa
   StdErr=/data/run01/zhangyc/numa/slurm-198107.out
   StdIn=/dev/null
   StdOut=/data/run01/zhangyc/numa/slurm-198107.out
   Power=
   GresEnforceBind=Yes
   CpusPerTres=gpu:6
   TresPerJob=gpu:1
   NtasksPerTRES:0



________________________________
发件人: slurm-users 代表 Ward Poelmans
已发送: 2022 年 10 月 14 日 星期五 17:58
收件人: slurm-users at lists.schedmd.com
主题: Re: [slurm-users] How to bind GPUs with CPU cores

Hi William,

On 14/10/2022 11:41, William Zhang wrote:

> How to realize this function .
> For example ,
> A job requires 6 CPUs with 1 GPU .And it runs on gpu ID 0 , CPU ID 0-5 .
> The second job requires 8 CPUs with 1 GPU . If it runs on gpu ID 1 ,we hope the CPU ID is 16-23.
> The third job requires 6 CPUs with 1 GPU . If it runs on gpu ID 2 ,we hope the CPU ID is 32-37.
> The next job requires 12 CPUs with 2 GPU . If it runs on gpu ID 3-4 ,we hope the CPU ID is 48-53,64-69 .
>
>
> Can we implement this function ?

Have a look at the --gres-flags=enforce-binding option of sbatch.

Ward

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221014/4ff8bb20/attachment-0001.htm>


More information about the slurm-users mailing list