[slurm-users] [EXT] --mem is not limiting the job's memory
Boris Yazlovitsky
borisyaz at gmail.com
Fri Jun 23 02:49:33 UTC 2023
it's still not constraining memory...
a memhog job continues to memhog:
boris at rod:~/scripts$ sacct --starttime=2023-05-01
--format=jobid,user,start,elapsed,reqmem,maxrss,maxvmsize,nodelist,state,exit
-j 199
JobID User Start Elapsed ReqMem MaxRSS
MaxVMSize NodeList State ExitCode
------------ --------- ------------------- ---------- ---------- ----------
---------- --------------- ---------- --------
199 boris 2023-06-23T02:42:30 00:01:21 1M
milhouse COMPLETED 0:0
199.batch 2023-06-23T02:42:30 00:01:21 104857988K
104858064K milhouse COMPLETED 0:0
One thing I noticed is that the machines I'm working on do not have
libcgroup and libcgroup-dev installed - but slurm does have its own cgroup
implementation? the slurmd processes do utilize /usr/lib/slurm/*cgroup.so
objects. I will try to recompile slurm with those cgrouplib packages
present.
On Thu, Jun 22, 2023 at 6:04 PM Ozeryan, Vladimir <
Vladimir.Ozeryan at jhuapl.edu> wrote:
> No worries,
>
> No, we don’t have any OS level settings, only “allowed_devices.conf” which
> just has /dev/random, /dev/tty and stuff like that.
>
>
>
> But I think this could be the culprit, check out man page for cgroup.conf
> AllowedRAMSpace=100
>
>
>
> I would just leave these four:
>
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainDevices=yes
> ConstrainRAMSpace=yes
>
>
>
> Vlad.
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Boris Yazlovitsky
> *Sent:* Thursday, June 22, 2023 5:40 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's memory
>
>
>
> *APL external email warning: *Verify sender
> slurm-users-bounces at lists.schedmd.com before clicking links or attachments
>
>
>
> thank you Vlad - looks like we have the same yes's
>
> Do you remember if you had to make any settings on the OS level or in the
> kernel to make it work?
>
>
>
> -b
>
>
>
> On Thu, Jun 22, 2023 at 5:31 PM Ozeryan, Vladimir <
> Vladimir.Ozeryan at jhuapl.edu> wrote:
>
> Hello,
>
>
>
> We have the following configured and it seems to be working ok.
>
>
>
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainDevices=yes
> ConstrainRAMSpace=yes
>
> Vlad.
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Boris Yazlovitsky
> *Sent:* Thursday, June 22, 2023 4:50 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's memory
>
>
>
> *APL external email warning: *Verify sender
> slurm-users-bounces at lists.schedmd.com before clicking links or attachments
>
>
>
> Hello Vladimir, thank you for your response.
>
>
>
> this is the cgroups.conf file:
>
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainDevices=yes
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
> MaxRAMPercent=90
> AllowedSwapSpace=0
> AllowedRAMSpace=100
> MemorySwappiness=0
> MaxSwapPercent=0
>
>
>
> /etc/default/grub:
>
> GRUB_DEFAULT=0
> GRUB_TIMEOUT_STYLE=hidden
> GRUB_TIMEOUT=0
> GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
> GRUB_CMDLINE_LINUX_DEFAULT=""
> GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cgroup_enable=memory
> swapaccount=1"
>
>
>
> what other cgroup settings need to be set?
>
>
>
> && thank you!
>
> -b
>
>
>
> On Thu, Jun 22, 2023 at 4:02 PM Ozeryan, Vladimir <
> Vladimir.Ozeryan at jhuapl.edu> wrote:
>
> --mem=5G. Should allocate 5G of memory per node.
>
> Are your cgroups configured?
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Boris Yazlovitsky
> *Sent:* Thursday, June 22, 2023 3:28 PM
> *To:* slurm-users at lists.schedmd.com
> *Subject:* [EXT] [slurm-users] --mem is not limiting the job's memory
>
>
>
> *APL external email warning: *Verify sender
> slurm-users-bounces at lists.schedmd.com before clicking links or attachments
>
>
>
> Running slurm 22.03.02 on Ubunutu 22.04 server.
>
> Jobs submitted with --mem=5g are able to allocate an unlimited amount of
> memory.
>
>
>
> how to limit on the job submission level how much memory it can grab?
>
>
>
> thanks, and best regards!
> Boris
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230622/71b5dd78/attachment.htm>
More information about the slurm-users
mailing list