<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>


</head>


<body dir="ltr">


<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">


<p>Helle Björn-Helge.</p>


<p><br>


</p>


<p>Thank for reminding me /sys/fs for checking OOM issues. I lost that already out of sight again.</p>


<p>In this case, there are more steps involved (one for each srun call). I'm not sure whether cgroup handles each separately, or just on a node-base. If the latter ... why do I have to specify --mem at all in each single srun-step call? That is somehow illogical,


 imho. I mean that would semantically mean "Please tell me the resources you in order to find reasonable slots to run you task. But don't worry! On a node, I anyway do not care much. Do as you like, as long as the node's total memory consumption is below the


 threshold ...!" ;)</p>


<p><br>


</p>


<p>Anyway. I will see soon. My current user who I support with that is using quite memory-consumping stuff (bioinformatics)


<span>😁</span><br>


</p>


<div><br>


</div>


<div>Thank you again!</div>


<div>Cheers, Martin<br>


</div>


<div><br>


</div>


<br>


<div style="color: rgb(0, 0, 0);">


<div>


<hr tabindex="-1" style="display:inline-block; width:98%">


<div id="x_divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>Von:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> im Auftrag von Bjørn-Helge Mevik <b.h.mevik@usit.uio.no><br>


<b>Gesendet:</b> Donnerstag, 19. Januar 2023 08:23<br>


<b>An:</b> slurm-users@schedmd.com<br>


<b>Betreff:</b> Re: [slurm-users] srun jobfarming hassle question</font>


<div> </div>


</div>


</div>


<font size="2"><span style="font-size:10pt;">


<div class="PlainText">"Ohlerich, Martin" <Martin.Ohlerich@lrz.de> writes:<br>


<br>


> Hello Björn-Helge.<br>


><br>


><br>


> Sigh ...<br>


><br>


> First of all, of course, many thanks! This indeed helped a lot!<br>


<br>


Good!<br>


<br>


> b) This only works if I have to specify --mem for a task. Although<br>


> manageable, I wonder why one needs to be that restrictive. In<br>


> principle, in the use case outlined, one task could use a bit less<br>


> memory, and the other may require a bit more the half of the node's<br>


> available memory. (So clearly this isn't always predictable.) I only<br>


> hope that in such cases the second task does not die from OOM ... (I<br>


> will know soon, I guess.)<br>


<br>


As I understand it, Slurm (at least cgroups) will only kill a step if it<br>


uses more memory *in total* on a node than the job got allocated to the<br>


node.  So if a job has 10 GiB allocated on a node, and a step runs two<br>


tasks there, one task could use 9 GiB and the other 1 GiB without the<br>


step being killed.<br>


<br>


You can inspect the memory limits that are in effect in cgroups (v1) in<br>


/sys/fs/cgroup/memory/slurm/uid_<uid>/job_<jobid> (usual location, at<br>


least).<br>


<br>


-- <br>


Regards,<br>


Bjørn-Helge Mevik, dr. scient,<br>


Department for Research Computing, University of Oslo<br>


<br>


</div>


</span></font></div>


</div>


</body>


</html>