<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:courier new,monospace">Hi Sushil,</div><div class="gmail_default" style="font-family:courier new,monospace"><br></div><div class="gmail_default" style="font-family:courier new,monospace">Try changing NodeName specification to:</div><div class="gmail_default" style="font-family:courier new,monospace"><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_default" style=""><font face="monospace">NodeName=localhost CPUs=96 State=UNKNOWN Gres=gpu<b style=""><font color="#ff0000">:8</font></b></font></div></blockquote><div class="gmail_default" style="font-family:courier new,monospace"><span style="font-family:Arial,Helvetica,sans-serif"><b><font color="#ff0000"><br></font></b></span></div><div class="gmail_default" style="font-family:courier new,monospace">Also: </div><div class="gmail_default" style="font-family:courier new,monospace"><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_default" style=""><font face="monospace">TaskPlugin=task/cgroup</font></div></blockquote><div class="gmail_default" style="font-family:courier new,monospace"><br></div><div class="gmail_default" style="font-family:courier new,monospace">Best,</div><div class="gmail_default" style="font-family:courier new,monospace"><br></div><div class="gmail_default" style="font-family:courier new,monospace">Steve</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 6, 2022 at 9:56 AM Sushil Mishra <<a href="mailto:sushilbioinfo@gmail.com">sushilbioinfo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Dear SLURM users,</div><div><br></div><div>I am very new to alarm and need some help in configuring slurm in a single node machine. This machine has 8x Nvidia GPUs and 96 core cpu. Vendor has set up a "LocalQ" but thai somehow is running all the calculations in GPU 0. If I submit 4 independent jobs at a time, it starts running all four calculations on GPU 0. I want slurm to assign a specific GPU (setting a CUDA_VISIBLE_DEVICE variable) for each job and before it starts running and hold rest of the jobs in queue until a GPU becomes available. <br></div><div><br></div><div>slurm.conf looks like:</div><div><b>$ cat /etc/slurm-llnl/slurm.conf <br></b></div><div>ClusterName=localcluster<br>SlurmctldHost=localhost<br>MpiDefault=none<br>ProctrackType=proctrack/linuxproc<br>ReturnToService=2<br>SlurmctldPidFile=/var/run/slurmctld.pid<br>SlurmctldPort=6817<br>SlurmdPidFile=/var/run/slurmd.pid<br>SlurmdPort=6818<br>SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd<br>SlurmUser=slurm<br>StateSaveLocation=/var/lib/slurm-llnl/slurmctld<br>SwitchType=switch/none<br>TaskPlugin=task/none<br>#<br>GresTypes=gpu<br>#SlurmdDebug=debug2<br><br># TIMERS<br>InactiveLimit=0<br>KillWait=30<br>MinJobAge=300<br>SlurmctldTimeout=120<br>SlurmdTimeout=300<br>Waittime=0<br># SCHEDULING<br>SchedulerType=sched/backfill<br>SelectType=select/cons_tres<br>SelectTypeParameters=CR_Core<br>#<br>#AccountingStoragePort=<br>AccountingStorageType=accounting_storage/none<br>JobCompType=jobcomp/none<br>JobAcctGatherFrequency=30<br>JobAcctGatherType=jobacct_gather/none<br>SlurmctldDebug=info<br>SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log<br>SlurmdDebug=info<br>SlurmdLogFile=/var/log/slurm-llnl/slurmd.log<br>#<br># COMPUTE NODES<br>NodeName=localhost CPUs=96 State=UNKNOWN Gres=gpu<br>#NodeName=mannose NodeAddr=130.74.2.86 CPUs=1 State=UNKNOWN<br><br># Partitions list<br>PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=7-00:00:00 State=UP<br>#PartitionName=gpu_short  MaxCPUsPerNode=32 DefMemPerNode=65556 DefCpuPerGPU=8 DefMemPerGPU=65556 MaxMemPerNode=532000 MaxTime=01-00:00:00 State=UP Nodes=localhost  Default=YES<br></div><div><br></div><div>and :</div><div><b>$ cat /etc/slurm-llnl/gres.conf</b></div>#detect GPUs<br>AutoDetect=nvlm<br># GPU gres<br>NodeName=localhost Name=gpu File=/dev/nvidia0<br>NodeName=localhost Name=gpu File=/dev/nvidia1<br>NodeName=localhost Name=gpu File=/dev/nvidia2<br>NodeName=localhost Name=gpu File=/dev/nvidia3<br>NodeName=localhost Name=gpu File=/dev/nvidia4<br>NodeName=localhost Name=gpu File=/dev/nvidia5<br>NodeName=localhost Name=gpu File=/dev/nvidia6<br>NodeName=localhost Name=gpu File=/dev/nvidia7<br><br><div>Best,<br></div><div>Sushil</div><div></div><div><br></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="color:rgb(34,34,34);font-family:"courier new",monospace">________________________________________________________________</span><br style="color:rgb(34,34,34);font-family:"courier new",monospace"><span style="color:rgb(34,34,34);font-family:"courier new",monospace"> Steve Cousins          <span>In</span><span>terim Director/</span>Supercomputer Engineer</span><br style="color:rgb(34,34,34);font-family:"courier new",monospace"><span style="color:rgb(34,34,34);font-family:"courier new",monospace"> Advanced Computing Group            University of Maine System</span><br style="color:rgb(34,34,34);font-family:"courier new",monospace"><span style="color:rgb(34,34,34);font-family:"courier new",monospace"> 244 Neville Hall (UMS Data Center)              (207) 581-3574</span><br style="color:rgb(34,34,34);font-family:"courier new",monospace"><span style="color:rgb(34,34,34);font-family:"courier new",monospace"> Orono ME 04469                      steve.cousins at <a href="http://maine.edu/" style="color:rgb(17,85,204)" target="_blank">maine.edu</a></span><br style="color:rgb(34,34,34)"></div></div></div></div>