<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<span style="font-size:12pt" class="ContentPasted0 ContentPasted1">Hi Ward,</span>
<div style="font-size:12pt" class="ContentPasted0">I have a try with --gres-flags=enforce-binding but it doesn't work.</div>
<div style="font-size:12pt" class="ContentPasted0">The first job apply 1GPU and 6 CPUs . The CPU ID is 0-5 GPU ID is 0.</div>
<div style="font-size:12pt" class="ContentPasted0">The second job apply 1GPU and 6 CPUs.The CPU ID is 6-11.But I hope the CPU ID is 16-21.
<br class="ContentPasted0">
</div>
<div style="font-size:12pt"><br class="ContentPasted0">
</div>
<div style="font-size:12pt" class="ContentPasted0">[zhangyc@ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh
<div class="ContentPasted0">Submitted batch job 198106</div>
<div class="ContentPasted0">[zhangyc@ln01 numa]$ </div>
<div class="ContentPasted0">[zhangyc@ln01 numa]$ scontrol show job 198106 -d</div>
<div class="ContentPasted0">JobId=198106 JobName=run.sh</div>
<div class="ContentPasted0"> UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A</div>
<div class="ContentPasted0"> Priority=4294704732 Nice=0 Account=zhangyc QOS=normal WCKey=*</div>
<div class="ContentPasted0"> JobState=RUNNING Reason=None Dependency=(null)</div>
<div class="ContentPasted0"> Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0</div>
<div class="ContentPasted0"> DerivedExitCode=0:0</div>
<div class="ContentPasted0"> RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A</div>
<div class="ContentPasted0"> SubmitTime=2022-10-14T18:38:52 EligibleTime=2022-10-14T18:38:52</div>
<div class="ContentPasted0"> AccrueTime=2022-10-14T18:38:52</div>
<div class="ContentPasted0"> StartTime=2022-10-14T18:38:56 EndTime=Unknown Deadline=N/A</div>
<div class="ContentPasted0"> SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:38:56</div>
<div class="ContentPasted0"> Partition=gpu_c128 AllocNode:Sid=ln01:26986</div>
<div class="ContentPasted0"> ReqNodeList=g0036 ExcNodeList=(null)</div>
<div class="ContentPasted0"> NodeList=g0036</div>
<div class="ContentPasted0"> BatchHost=g0036</div>
<div class="ContentPasted0"> NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*</div>
<div class="ContentPasted0"> TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1</div>
<div class="ContentPasted0"> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*</div>
<div class="ContentPasted0"> JOB_GRES=gpu:1</div>
<div class="ContentPasted0"> Nodes=g0036 CPU_IDs=0-5 Mem=60000 GRES=gpu:1(IDX:0)</div>
<div class="ContentPasted0"> MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0</div>
<div class="ContentPasted0"> Features=(null) DelayBoot=00:00:00</div>
<div class="ContentPasted0"> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)</div>
<div class="ContentPasted0"> Command=./run.sh</div>
<div class="ContentPasted0"> WorkDir=/data/run01/zhangyc/numa</div>
<div class="ContentPasted0"> StdErr=/data/run01/zhangyc/numa/slurm-198106.out</div>
<div class="ContentPasted0"> StdIn=/dev/null</div>
<div class="ContentPasted0"> StdOut=/data/run01/zhangyc/numa/slurm-198106.out</div>
<div class="ContentPasted0"> Power=</div>
<div class="ContentPasted0"> GresEnforceBind=Yes</div>
<div class="ContentPasted0"> CpusPerTres=gpu:6</div>
<div class="ContentPasted0"> TresPerJob=gpu:1</div>
NtasksPerTRES:0</div>
<div style="font-size:12pt"><br class="ContentPasted0">
</div>
<div style="font-size:12pt"><br class="ContentPasted0">
</div>
<div style="font-size:12pt" class="ContentPasted0">[zhangyc@ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh
<div class="ContentPasted0">Submitted batch job 198107</div>
<div class="ContentPasted0">[zhangyc@ln01 numa]$ scontrol show job 198107 -d</div>
<div class="ContentPasted0">JobId=198107 JobName=run.sh</div>
<div class="ContentPasted0"> UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A</div>
<div class="ContentPasted0"> Priority=4294704731 Nice=0 Account=zhangyc QOS=normal WCKey=*</div>
<div class="ContentPasted0"> JobState=RUNNING Reason=None Dependency=(null)</div>
<div class="ContentPasted0"> Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0</div>
<div class="ContentPasted0"> DerivedExitCode=0:0</div>
<div class="ContentPasted0"> RunTime=00:00:04 TimeLimit=UNLIMITED TimeMin=N/A</div>
<div class="ContentPasted0"> SubmitTime=2022-10-14T18:39:05 EligibleTime=2022-10-14T18:39:05</div>
<div class="ContentPasted0"> AccrueTime=2022-10-14T18:39:05</div>
<div class="ContentPasted0"> StartTime=2022-10-14T18:39:05 EndTime=Unknown Deadline=N/A</div>
<div class="ContentPasted0"> SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:39:05</div>
<div class="ContentPasted0"> Partition=gpu_c128 AllocNode:Sid=ln01:26986</div>
<div class="ContentPasted0"> ReqNodeList=g0036 ExcNodeList=(null)</div>
<div class="ContentPasted0"> NodeList=g0036</div>
<div class="ContentPasted0"> BatchHost=g0036</div>
<div class="ContentPasted0"> NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*</div>
<div class="ContentPasted0"> TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1</div>
<div class="ContentPasted0"> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*</div>
<div class="ContentPasted0"> JOB_GRES=gpu:1</div>
<div class="ContentPasted0"> Nodes=g0036 CPU_IDs=6-11 Mem=60000 GRES=gpu:1(IDX:1)</div>
<div class="ContentPasted0"> MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0</div>
<div class="ContentPasted0"> Features=(null) DelayBoot=00:00:00</div>
<div class="ContentPasted0"> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)</div>
<div class="ContentPasted0"> Command=./run.sh</div>
<div class="ContentPasted0"> WorkDir=/data/run01/zhangyc/numa</div>
<div class="ContentPasted0"> StdErr=/data/run01/zhangyc/numa/slurm-198107.out</div>
<div class="ContentPasted0"> StdIn=/dev/null</div>
<div class="ContentPasted0"> StdOut=/data/run01/zhangyc/numa/slurm-198107.out</div>
<div class="ContentPasted0"> Power=</div>
<div class="ContentPasted0"> GresEnforceBind=Yes</div>
<div class="ContentPasted0"> CpusPerTres=gpu:6</div>
<div class="ContentPasted0"> TresPerJob=gpu:1</div>
NtasksPerTRES:0</div>
<br>
</div>
<div>
<div><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0);" class="elementToProof">
<br>
<hr tabindex="-1" style="display:inline-block; width:98%;">
<b>发件人:</b> slurm-users 代表 Ward Poelmans<br>
<b>已发送:</b> 2022 年 10 月 14 日 星期五 17:58<br>
<b>收件人:</b> slurm-users@lists.schedmd.com<br>
<b>主题:</b> Re: [slurm-users] How to bind GPUs with CPU cores
<div><br>
</div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Hi William,<br>
<br>
On 14/10/2022 11:41, William Zhang wrote:<br>
<br>
> How to realize this function .<br>
> For example ,<br>
> A job requires 6 CPUs with 1 GPU .And it runs on gpu ID 0 , CPU ID 0-5 .<br>
> The second job requires 8 CPUs with 1 GPU . If it runs on gpu ID 1 ,we hope the CPU ID is 16-23.<br>
> The third job requires 6 CPUs with 1 GPU . If it runs on gpu ID 2 ,we hope the CPU ID is 32-37.<br>
> The next job requires 12 CPUs with 2 GPU . If it runs on gpu ID 3-4 ,we hope the CPU ID is 48-53,64-69 .<br>
> <br>
> <br>
> Can we implement this function ?<br>
<br>
Have a look at the --gres-flags=enforce-binding option of sbatch.<br>
<br>
Ward<br>
<br>
</div>
</span></font></div>
</div>
</body>
</html>