<div dir="auto"><div>In terms of dependencies, please think about timing. Currently one loop takes ~70 minutes, and say there is a queue time T for any job. If you split the slow part to run serial one loop takes ~190 minutes + 2T. The time for N iterations would be ~ 190N +570*T versus 70N+T. </div><div><br></div><div data-smartmail="gmail_signature">---<br>Professor Laurence Marks (Laurie)<br><a href="http://www.numis.northwestern.edu">www.numis.northwestern.edu</a><br>"Research is to see what everybody else has seen, and to think what nobody else has thought" Albert Szent-Györgyi</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Dec 20, 2023, 14:40 Renfro, Michael <<a href="mailto:Renfro@tntech.edu">Renfro@tntech.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="m_544987036716666291WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Is this Northwestern’s Quest HPC or another one? I know at least a few of the people involved with Quest, and I wouldn’t have thought they’d be in dire need of coaching.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">And to follow on with Davide’s point, this really sounds like a case for submitting multiple jobs with dependencies between them, as per [1, 2, 3].<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[1] <a href="https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1795" target="_blank" rel="noreferrer">
https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1795</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[2] <a href="https://bioinformaticsworkbook.org/Appendix/HPC/SLURM/submitting-dependency-jobs-using-slurm.html#gsc.tab=0" target="_blank" rel="noreferrer">
https://bioinformaticsworkbook.org/Appendix/HPC/SLURM/submitting-dependency-jobs-using-slurm.html#gsc.tab=0</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[3] <a href="https://slurm.schedmd.com/sbatch.html#OPT_dependency" target="_blank" rel="noreferrer">
https://slurm.schedmd.com/sbatch.html#OPT_dependency</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><u></u> <u></u></span></p>
<div id="m_544987036716666291mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black">From:
</span></b><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black">slurm-users <<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users-bounces@lists.schedmd.com</a>> on behalf of Laurence Marks <<a href="mailto:laurence.marks@gmail.com" target="_blank" rel="noreferrer">laurence.marks@gmail.com</a>><br>
<b>Date: </b>Wednesday, December 20, 2023 at 1:40 PM<br>
<b>To: </b>Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a>><br>
<b>Subject: </b>Re: [slurm-users] Reproducible irreproducible problem (timeout?)<u></u><u></u></span></p>
</div>
<p align="center" style="margin:0in;text-align:center;background:white"><b><span style="font-size:12.0pt;color:red;background:white">External Email Warning</span></b></p>
<p align="center" style="margin-right:12.0pt;margin-bottom:0in;margin-left:12.0pt;text-align:center;background:white">
<b><span style="font-size:12.0pt;color:red">This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.</span></b></p>
<div class="MsoNormal" align="center" style="text-align:center"><span style="font-size:11.0pt">
<hr size="0" width="100%" align="center">
</span></div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana",sans-serif;color:black">It is a University "supercomputer", not a national facility. Hence they are not that expert, which is why I am asking here. I am pretty certain that it is some
form of communication issue, but beyond that it is not clear.<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana",sans-serif;color:black"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana",sans-serif;color:black">If I get suggestions such as "why don't they look for ABC in XYZ" then I may persuade them to look at specifics. They will need the coaching, alas.<u></u><u></u></span></p>
</div>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt"><u></u> <u></u></span></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">On Wed, Dec 20, 2023 at 1:25 PM Gerhard Strangar <</span><a href="mailto:g.s@arcor.de" target="_blank" rel="noreferrer"><span style="font-size:11.0pt">g.s@arcor.de</span></a><span style="font-size:11.0pt">> wrote:<u></u><u></u></span></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:11.0pt">Laurence Marks wrote:<br>
<br>
> After some (irreproducible) time, often one of the three slow tasks hangs.<br>
> A symptom is that if I try and ssh into the main node of the subtask (which<br>
> is running 128 mpi on the 4 nodes) I get "Authentication failed".<br>
<br>
How about asking an admin to check why it hangs?<u></u><u></u></span></p>
</blockquote>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt"><br clear="all">
<u></u><u></u></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><u></u> <u></u></span></p>
</div>
<p class="MsoNormal"><span class="m_544987036716666291gmailsignatureprefix"><span style="font-size:11.0pt">--
</span></span><span style="font-size:11.0pt"><u></u><u></u></span></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Emeritus Professor Laurence Marks (Laurie)
<u></u><u></u></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Northwestern University<u></u><u></u></span></p>
<div>
<p class="MsoNormal"><a href="http://www.numis.northwestern.edu/" target="_blank" rel="noreferrer"><span style="font-size:11.0pt">Webpage</span></a><span style="font-size:11.0pt"> and </span><a href="http://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en" target="_blank" rel="noreferrer"><span style="font-size:11.0pt">Google
Scholar link</span></a><span style="font-size:11.0pt"><u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">"Research is to see what everybody else has seen, and to think what nobody else has thought", Albert Szent-Györgyi<u></u><u></u></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote></div>