<div dir="ltr">Ahmet,<div><br></div><div>Thank you for taking the time to respond to my question. </div><div><br></div><div>Yes, the --mem=1GBB is a typo. It's correct in my script, I just fat-fingered it in the email. :-)</div><div><br></div><div>BTW, the exact version I am using is 19.05.<b>2.</b></div><div><br></div><div>Regarding your response, it seems that that might be more than what I need. I simply want to enforce the memory limits as specified by the user at job submission time. This seems to have been the behavior in previous versions of Slurm. What I want is what is described in the 19.05 release notes:</div><div><br></div><div><i><font color="#0000ff">RELEASE NOTES FOR SLURM VERSION 19.05<br>28 May 2019<br></font></i></div><div><i><font color="#0000ff"><br></font></i></div><div><i><font color="#0000ff">NOTE: slurmd and slurmctld will now fatal if two incompatible mechanisms for<br> enforcing memory limits are set. This makes incompatible the use of<br> task/cgroup memory limit enforcing (Constrain[RAM|Swap]Space=yes) with<br> JobAcctGatherParams=OverMemoryKill, which could cause problems when a<br> task is killed by one of them while the other is at the same time<br> managing that task. The NoOverMemoryKill setting has been deprecated in<br> favor of OverMemoryKill, since now the default is *NOT* to have any<br> memory enforcement mechanism.<br><br>NOTE: MemLimitEnforce parameter has been removed and the functionality that<br> was provided with it has been merged into a JobAcctGatherParams. It<br> may be enabled by setting JobAcctGatherParams=OverMemoryKill, so now<br> job and steps killing by OOM is enabled from the same place.<br></font></i></div><div><i><font color="#0000ff"> </font></i><br></div><div><br></div><div><br></div><div>So, is it really necessary to do what you suggested to get that functionality?</div><div><br></div><div>If someone could post just a simple slurm.conf file that forces the memory limits to be honored (and kills the job if they are exceeded), then I could extract what I need from that.</div><div><br></div><div>Again, thanks for the assistance.</div><div><br></div><div>Mike</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 24, 2019 at 11:27 PM mercan <<a href="mailto:ahmet.mercan@uhem.itu.edu.tr">ahmet.mercan@uhem.itu.edu.tr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi;<br>
<br>
You should set<br>
<br>
SelectType=select/cons_res<br>
<br>
and plus one of these:<br>
<br>
SelectTypeParameters=CR_Memory<br>
SelectTypeParameters=CR_Core_Memory<br>
SelectTypeParameters=CR_CPU_Memory<br>
SelectTypeParameters=CR_Socket_Memory<br>
<br>
to open Memory allocation tracking according to documentation:<br>
<br>
<a href="https://slurm.schedmd.com/cons_res_share.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/cons_res_share.html</a><br>
<br>
Also, the line:<br>
<br>
#SBATCH --mem=1GBB<br>
<br>
contains "1GBB". Is this same at job script?<br>
<br>
<br>
Regards;<br>
<br>
Ahmet M.<br>
<br>
<br>
24.10.2019 23:00 tarihinde Mike Mosley yazdı:<br>
> Hello,<br>
><br>
> We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migrate <br>
> from it toTorque/Moab in the near future.<br>
><br>
> One of the things our users are used to is that when their jobs exceed <br>
> the amount of memory they requested, the job is terminated by the <br>
> scheduler. We realize the Slurm prefers to use cgroups to contain <br>
> rather than kill the jobs but initially we need to have the kill <br>
> option in place to transition our users.<br>
><br>
> So, looking at the documentation, it appears that in 19.05, the <br>
> following needs to be set to accomplish this:<br>
><br>
> JobAcctGatherParams = OverMemoryKill<br>
><br>
><br>
> Other possibly relevant settings we made:<br>
><br>
> JobAcctGatherType = jobacct_gather/linux<br>
><br>
> ProctrackType = proctrack/linuxproc<br>
><br>
><br>
> We have avoided configuring any cgroup parameters for the time being.<br>
><br>
> Unfortunately, when we submit a job with the following:<br>
><br>
> #SBATCH --nodes=1<br>
><br>
> #SBATCH --ntasks-per-node=1<br>
><br>
> #SBATCH --mem=1GBB<br>
><br>
><br>
> We see RSS ofthe job steadily increase beyond the 1GB limit and it is <br>
> never killed. Interestingly enough, the proc information shows the <br>
> ulimit (hard and soft) for the process set to around 1GB.<br>
><br>
> We have tried various settings without any success. Can anyone point <br>
> out what we are doing wrong?<br>
><br>
> Thanks,<br>
><br>
> Mike<br>
><br>
> -- <br>
> */J. Michael Mosley/*<br>
> University Research Computing<br>
> The University of North Carolina at Charlotte<br>
> 9201 University City Blvd<br>
> Charlotte, NC 28223<br>
> _704.687.7065 _ _ j/<a href="mailto:mmosley@uncc.edu" target="_blank">mmosley@uncc.edu</a> <mailto:<a href="mailto:mmosley@uncc.edu" target="_blank">mmosley@uncc.edu</a>>/_<br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div style="font-size:12.8px"><div dir="ltr"><div><span style="font-family:"times new roman",serif"><b><i>J. Michael Mosley</i></b><br>University Research Computing<br>The University of North Carolina at Charlotte<br>9201 University City Blvd<br>Charlotte, NC 28223<br><u>704.687.7065 </u> <u> j<i><a href="mailto:mmosley@uncc.edu" target="_blank">mmosley@uncc.edu</a></i></u></span></div></div></div></div></div>