<div dir="ltr"><div>Howdy, and thanks for the warm welcome,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 24 Feb 2023 at 07:31, Doug Meyer <<a href="mailto:dameyer99@gmail.com">dameyer99@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><div>Did you configure your node definition with the outputs of slurmd -C? Ignore boards. Don't know if it is still true but several years ago declaring boards made things difficult. <br></div><div><br></div></div></blockquote><div><br></div><div>$ slurmd -C<br>NodeName=shavak-DIT400TR-55L CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=95311<br>UpTime=0-00:47:51</div><div>$ grep NodeName /etc/slurm-llnl/slurm.conf<br>NodeName=shavak-DIT400TR-55L CPUs=64 RealMemory=95311 Gres=gpu:1<br></div><div><br></div><div>There is a difference. I, too, discarded the Boards and sockets in slurmd.conf . Is that the problem?</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div>Also, if you have hyperthreaded AMD or Intel processors your partition declaration should be overscribe:2</div><div><br></div></div></blockquote><div><br></div><div>Yes I do, It's actually 16 X 2 cores with hyperthreading, but the BIOS is set to show them as 64 cores.</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div>Start with a very simple job with a script containing sleep 100 or something else without any runtime issues.</div><div><br></div></div></blockquote><div><br></div><div>I ran t<a href="https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/usr/local/share/examples/mpi_runs_inf/mpi_count.c">his MPI hello world thing </a>with <a href="https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/usr/local/share/examples/mpi_runs_inf/mpi_count_normal.sbatch">this sbatch script.</a> Should be the same thing as your suggestion, basically.</div><div>Should I switch to 'srun' in the batch file?</div><div><br></div><div>AR</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div>When I started with slurm I built the sbatch one small step at a time. Nodes, cores. memory, partition, mail, etc <br></div><div><br></div><div>It sounds like your config is very close but your problem may be in the submit script.</div><div><br></div><div>Best of luck and welcome to slurm. It is very powerful with a huge community.</div><div><br></div><div>Doug<br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 23, 2023 at 6:58 AM Analabha Roy <<a href="mailto:hariseldon99@gmail.com" target="_blank">hariseldon99@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi folks,<div><br></div><div>I have a single-node "cluster" running Ubuntu 20.04 LTS with the distribution packages for slurm (slurm-wlm 19.05.5)</div><div>Slurm only ran one job in the node at a time with the default configuration, leaving all other jobs pending.</div><div><div>This happened even if that one job only requested like a few cores (the node has 64 cores, and slurm.conf is configged accordingly).</div><div></div></div><div><br></div><div>in slurm conf, SelectType is set to select/cons_res, and SelectTypeParameters to CR_Core. NodeName is set with CPUs=64. Path to file is referenced below.<br></div><div><br></div><div>So I set OverSubscribe=FORCE in the partition config and restarted the daemons. </div><div><br></div><div>Multiple jobs are now run concurrently, but when Slurm is oversubscribed, it is <b>truly</b> <b>oversubscribed</b>. That is to say, it runs so many jobs that there are more processes running than cores/threads.</div><div>How should I config slurm so that it runs multiple jobs at once per node, but ensures that it doesn't run more processes than there are cores? Is there some TRES magic for this that I can't seem to figure out?</div><div><br></div><div>My slurm.conf is here on github: <a href="https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/slurm.conf" target="_blank">https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/slurm.conf</a></div><div>The only gres I've set is for the GPU: <a href="https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/gres.conf" target="_blank">https://github.com/hariseldon99/buparamshavak/blob/main/shavak_root/etc/slurm-llnl/gres.conf</a></div><div><div><br></div><div>Thanks for your attention,</div><div>Regards,</div><div>AR</div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div>Analabha Roy<br></div><div>Assistant Professor</div><div><a href="http://www.buruniv.ac.in/academics/department/physics" target="_blank">Department of Physics</a></div><div><a href="http://www.buruniv.ac.in/" target="_blank">The University of Burdwan</a></div><div>Golapbag Campus, Barddhaman 713104</div><div>West Bengal, India</div><div>Emails: <a href="mailto:daneel@utexas.edu" target="_blank">daneel@utexas.edu</a>, <a href="mailto:aroy@phys.buruniv.ac.in" target="_blank">aroy@phys.buruniv.ac.in</a>, <a href="mailto:hariseldon99@gmail.com" target="_blank">hariseldon99@gmail.com</a><br><div><font face="tahoma, sans-serif">Webpage: <a href="http://www.ph.utexas.edu/~daneel/" target="_blank">http://www.ph.utexas.edu/~daneel/</a></font></div></div></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Analabha Roy<br></div><div>Assistant Professor</div><div><a href="http://www.buruniv.ac.in/academics/department/physics" target="_blank">Department of Physics</a></div><div><a href="http://www.buruniv.ac.in/" target="_blank">The University of Burdwan</a></div><div>Golapbag Campus, Barddhaman 713104</div><div>West Bengal, India</div><div>Emails: <a href="mailto:daneel@utexas.edu" target="_blank">daneel@utexas.edu</a>, <a href="mailto:aroy@phys.buruniv.ac.in" target="_blank">aroy@phys.buruniv.ac.in</a>, <a href="mailto:hariseldon99@gmail.com" target="_blank">hariseldon99@gmail.com</a><br><div><font face="tahoma, sans-serif">Webpage: <a href="http://www.ph.utexas.edu/~daneel/" target="_blank">http://www.ph.utexas.edu/~daneel/</a></font></div></div></div></div></div></div></div>