<div dir="ltr">Ahmet,<div><br></div><div>Thank you for taking the time to respond to my question.    </div><div><br></div><div>Yes, the --mem=1GBB is a typo.   It's correct in my script, I just fat-fingered it in the email. :-)</div><div><br></div><div>BTW, the exact version I am using is 19.05.<b>2.</b></div><div><br></div><div>Regarding your response, it seems that that might be more than what I need.   I simply want to enforce the memory limits as specified by the user at job submission time.   This seems to have been the behavior in previous versions of Slurm.   What I want is what is described in the 19.05 release notes:</div><div><br></div><div><i><font color="#0000ff">RELEASE NOTES FOR SLURM VERSION 19.05<br>28 May 2019<br></font></i></div><div><i><font color="#0000ff"><br></font></i></div><div><i><font color="#0000ff">NOTE: slurmd and slurmctld will now fatal if two incompatible mechanisms for<br>      enforcing memory limits are set. This makes incompatible the use of<br>      task/cgroup memory limit enforcing (Constrain[RAM|Swap]Space=yes) with<br>      JobAcctGatherParams=OverMemoryKill, which could cause problems when a<br>      task is killed by one of them while the other is at the same time<br>      managing that task. The NoOverMemoryKill setting has been deprecated in<br>      favor of OverMemoryKill, since now the default is *NOT* to have any<br>      memory enforcement mechanism.<br><br>NOTE: MemLimitEnforce parameter has been removed and the functionality that<br>      was provided with it has been merged into a JobAcctGatherParams. It<br>      may be enabled by setting JobAcctGatherParams=OverMemoryKill, so now<br>      job and steps killing by OOM is enabled from the same place.<br></font></i></div><div><i><font color="#0000ff"> </font></i><br></div><div><br></div><div><br></div><div>So, is it really necessary to do what you suggested to get that functionality?</div><div><br></div><div>If someone could post just a simple slurm.conf file that forces the memory limits to be honored (and kills the job if they are exceeded), then I could extract what I need from that.</div><div><br></div><div>Again, thanks for the assistance.</div><div><br></div><div>Mike</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 24, 2019 at 11:27 PM mercan <<a href="mailto:ahmet.mercan@uhem.itu.edu.tr">ahmet.mercan@uhem.itu.edu.tr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi;<br>

<br>

You should set<br>

<br>

SelectType=select/cons_res<br>

<br>

and plus one of these:<br>

<br>

SelectTypeParameters=CR_Memory<br>

SelectTypeParameters=CR_Core_Memory<br>

SelectTypeParameters=CR_CPU_Memory<br>

SelectTypeParameters=CR_Socket_Memory<br>

<br>

to open Memory allocation tracking according to documentation:<br>

<br>

<a href="https://slurm.schedmd.com/cons_res_share.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/cons_res_share.html</a><br>

<br>

Also, the line:<br>

<br>

#SBATCH --mem=1GBB<br>

<br>

contains "1GBB". Is this same at job script?<br>

<br>

<br>

Regards;<br>

<br>

Ahmet M.<br>

<br>

<br>

24.10.2019 23:00 tarihinde Mike Mosley yazdı:<br>

> Hello,<br>

><br>

> We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migrate <br>

> from it toTorque/Moab in the near future.<br>

><br>

> One of the things our users are used to is that when their jobs exceed <br>

> the amount of memory they requested, the job is terminated by the <br>

> scheduler.   We realize the Slurm prefers to use cgroups to contain <br>

> rather than kill the jobs but initially we need to have the kill <br>

> option in place to transition our users.<br>

><br>

> So, looking at the documentation, it appears that in 19.05, the <br>

> following needs to be set to accomplish this:<br>

><br>

> JobAcctGatherParams = OverMemoryKill<br>

><br>

><br>

> Other possibly relevant settings we made:<br>

><br>

> JobAcctGatherType = jobacct_gather/linux<br>

><br>

> ProctrackType = proctrack/linuxproc<br>

><br>

><br>

> We have avoided configuring any cgroup parameters for the time being.<br>

><br>

> Unfortunately, when we submit a job with the following:<br>

><br>

> #SBATCH --nodes=1<br>

><br>

> #SBATCH --ntasks-per-node=1<br>

><br>

> #SBATCH --mem=1GBB<br>

><br>

><br>

> We see RSS ofthe  job steadily increase beyond the 1GB limit and it is <br>

> never killed.    Interestingly enough, the proc information shows the <br>

> ulimit (hard and soft) for the process set to around 1GB.<br>

><br>

> We have tried various settings without any success.   Can anyone point <br>

> out what we are doing wrong?<br>

><br>

> Thanks,<br>

><br>

> Mike<br>

><br>

> -- <br>

> */J. Michael Mosley/*<br>

> University Research Computing<br>

> The University of North Carolina at Charlotte<br>

> 9201 University City Blvd<br>

> Charlotte, NC  28223<br>

> _704.687.7065 _ _ j/<a href="mailto:mmosley@uncc.edu" target="_blank">mmosley@uncc.edu</a> <mailto:<a href="mailto:mmosley@uncc.edu" target="_blank">mmosley@uncc.edu</a>>/_<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div style="font-size:12.8px"><div dir="ltr"><div><span style="font-family:"times new roman",serif"><b><i>J. Michael Mosley</i></b><br>University Research Computing<br>The University of North Carolina at Charlotte<br>9201 University City Blvd<br>Charlotte, NC  28223<br><u>704.687.7065 </u>    <u> j<i><a href="mailto:mmosley@uncc.edu" target="_blank">mmosley@uncc.edu</a></i></u></span></div></div></div></div></div>