<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>Mike, <br>
</p>
<p>You don't include your entire sbatch script, so it's really hard
to say what's going wrong when we only have a single line to work
with. Based on what you have told us, I'm guessing you are
specifying a memory requirement per node greater than 128000. When
you specify a nodelist, Slurm will assign your job to all of those
nodes, not a subset that matches the other job specifications
(--mem or --mem-per-cpu, or --tasks, etc.):</p>
<p>
<blockquote type="cite">
<dl compact="compact">
<dt><b>-w</b>, <b>--nodelist</b>=<<i>node name list</i>></dt>
<dd>
Request a specific list of hosts.
The job will contain <i>all</i> of these hosts and possibly
additional hosts
as needed to satisfy resource requirements.
</dd>
</dl>
</blockquote>
<br>
</p>
<pre class="moz-signature" cols="72">Prentice </pre>
<div class="moz-cite-prefix">On 6/7/21 7:46 PM, Yap, Mike wrote:<br>
</div>
<blockquote type="cite"
cite="mid:SY2PR01MB2540C25B915D7F4E3E2CA80BD7389@SY2PR01MB2540.ausprd01.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}div.WordSection1
{page:WordSection1;}ol
{margin-bottom:0cm;}ul
{margin-bottom:0cm;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi All<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Can another advise the
possibilities of me encountering the error message as below
when submitting a job ?<o:p></o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">sbatch: error: memory
allocation failure<o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-US">The same script use work
perfectly fine until I include <b>#SBATCH
--nodelist=(compute[015-046]) (once removed it work as it
should)<o:p></o:p></b></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The issues<o:p></o:p></span></p>
<ol style="margin-top:0cm" type="1" start="1">
<li class="MsoListParagraph"
style="margin-left:0cm;mso-list:l0 level1 lfo1"><span
lang="EN-US">For the current setup, I have specific
resources available for each compute node
<o:p></o:p></span></li>
<ol style="margin-top:0cm" type="a" start="1">
<li class="MsoListParagraph"
style="margin-left:0cm;mso-list:l0 level2 lfo1"><span
lang="EN-US">(NodeName=compute[007-014] Procs=36
CoresPerSocket=18 RealMemory=384000 ThreadsPerCore=1
Boards=1 SocketsPerBoard=2) – newer model<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="margin-left:0cm;mso-list:l0 level2 lfo1"><span
lang="EN-US">(NodeName=compute[001-006] Procs=16
CoresPerSocket=18 RealMemory=128000 ThreadsPerCore=1
Boards=1 SocketsPerBoard=2)<o:p></o:p></span></li>
</ol>
<li class="MsoListParagraph"
style="margin-left:0cm;mso-list:l0 level1 lfo1"><span
lang="EN-US">I have same resources sharing between
multiple queue (working fine)<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="margin-left:0cm;mso-list:l0 level1 lfo1"><span
lang="EN-US">When running on parallel job, the exact same
job run when assigned to the same node category (ie
exclusively on 1a or 1b)<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="margin-left:0cm;mso-list:l0 level1 lfo1"><span
lang="EN-US">When running the exact same jobs but assigned
between 1a and 1b, the job will run on 1b node but no
activities on 1a
<o:p></o:p></span></li>
</ol>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Any suggestion<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Mike<o:p></o:p></span></p>
</div>
</blockquote>
</body>
</html>