[slurm-users] [EXT] --mem is not limiting the job's memory
Boris Yazlovitsky
borisyaz at gmail.com
Fri Jul 21 20:11:57 UTC 2023
Thanks folks to all who responded!
setting SelectTypeParameters = CR_CPU_Memory did the trick.
On Fri, Jun 23, 2023 at 3:21 AM Shunran Zhang <
szhang at ngs.gen-info.osaka-u.ac.jp> wrote:
> Hi
>
> Would you mind to check your job scheduling settings in slurm.conf ?
>
> Namely *SelectTypeParameters = **CR_CPU_Memory *or the like.
>
> Also, you may want to use systemd-cgtop to at least confirm jobs are
> indeed running in cgroups.
>
> Sincerely,
> S. Zhang
>
> On Fri, Jun 23, 2023, 12:07 Boris Yazlovitsky <borisyaz at gmail.com> wrote:
>
>> it's still not constraining memory...
>>
>> a memhog job continues to memhog:
>>
>> boris at rod:~/scripts$ sacct --starttime=2023-05-01
>> --format=jobid,user,start,elapsed,reqmem,maxrss,maxvmsize,nodelist,state,exit
>> -j 199
>> JobID User Start Elapsed ReqMem
>> MaxRSS MaxVMSize NodeList State ExitCode
>> ------------ --------- ------------------- ---------- ----------
>> ---------- ---------- --------------- ---------- --------
>> 199 boris 2023-06-23T02:42:30 00:01:21 1M
>> milhouse COMPLETED 0:0
>> 199.batch 2023-06-23T02:42:30 00:01:21
>> 104857988K 104858064K milhouse COMPLETED 0:0
>>
>> One thing I noticed is that the machines I'm working on do not have
>> libcgroup and libcgroup-dev installed - but slurm does have its own cgroup
>> implementation? the slurmd processes do utilize /usr/lib/slurm/*cgroup.so
>> objects. I will try to recompile slurm with those cgrouplib packages
>> present.
>>
>> On Thu, Jun 22, 2023 at 6:04 PM Ozeryan, Vladimir <
>> Vladimir.Ozeryan at jhuapl.edu> wrote:
>>
>>> No worries,
>>>
>>> No, we don’t have any OS level settings, only “allowed_devices.conf”
>>> which just has /dev/random, /dev/tty and stuff like that.
>>>
>>>
>>>
>>> But I think this could be the culprit, check out man page for cgroup.conf
>>> AllowedRAMSpace=100
>>>
>>>
>>>
>>> I would just leave these four:
>>>
>>> CgroupAutomount=yes
>>> ConstrainCores=yes
>>> ConstrainDevices=yes
>>> ConstrainRAMSpace=yes
>>>
>>>
>>>
>>> Vlad.
>>>
>>>
>>>
>>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf
>>> Of *Boris Yazlovitsky
>>> *Sent:* Thursday, June 22, 2023 5:40 PM
>>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>>> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's
>>> memory
>>>
>>>
>>>
>>> *APL external email warning: *Verify sender
>>> slurm-users-bounces at lists.schedmd.com before clicking links or
>>> attachments
>>>
>>>
>>>
>>> thank you Vlad - looks like we have the same yes's
>>>
>>> Do you remember if you had to make any settings on the OS level or in
>>> the kernel to make it work?
>>>
>>>
>>>
>>> -b
>>>
>>>
>>>
>>> On Thu, Jun 22, 2023 at 5:31 PM Ozeryan, Vladimir <
>>> Vladimir.Ozeryan at jhuapl.edu> wrote:
>>>
>>> Hello,
>>>
>>>
>>>
>>> We have the following configured and it seems to be working ok.
>>>
>>>
>>>
>>> CgroupAutomount=yes
>>> ConstrainCores=yes
>>> ConstrainDevices=yes
>>> ConstrainRAMSpace=yes
>>>
>>> Vlad.
>>>
>>>
>>>
>>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf
>>> Of *Boris Yazlovitsky
>>> *Sent:* Thursday, June 22, 2023 4:50 PM
>>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>>> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's
>>> memory
>>>
>>>
>>>
>>> *APL external email warning: *Verify sender
>>> slurm-users-bounces at lists.schedmd.com before clicking links or
>>> attachments
>>>
>>>
>>>
>>> Hello Vladimir, thank you for your response.
>>>
>>>
>>>
>>> this is the cgroups.conf file:
>>>
>>> CgroupAutomount=yes
>>> ConstrainCores=yes
>>> ConstrainDevices=yes
>>> ConstrainRAMSpace=yes
>>> ConstrainSwapSpace=yes
>>> MaxRAMPercent=90
>>> AllowedSwapSpace=0
>>> AllowedRAMSpace=100
>>> MemorySwappiness=0
>>> MaxSwapPercent=0
>>>
>>>
>>>
>>> /etc/default/grub:
>>>
>>> GRUB_DEFAULT=0
>>> GRUB_TIMEOUT_STYLE=hidden
>>> GRUB_TIMEOUT=0
>>> GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
>>> GRUB_CMDLINE_LINUX_DEFAULT=""
>>> GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cgroup_enable=memory
>>> swapaccount=1"
>>>
>>>
>>>
>>> what other cgroup settings need to be set?
>>>
>>>
>>>
>>> && thank you!
>>>
>>> -b
>>>
>>>
>>>
>>> On Thu, Jun 22, 2023 at 4:02 PM Ozeryan, Vladimir <
>>> Vladimir.Ozeryan at jhuapl.edu> wrote:
>>>
>>> --mem=5G. Should allocate 5G of memory per node.
>>>
>>> Are your cgroups configured?
>>>
>>>
>>>
>>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf
>>> Of *Boris Yazlovitsky
>>> *Sent:* Thursday, June 22, 2023 3:28 PM
>>> *To:* slurm-users at lists.schedmd.com
>>> *Subject:* [EXT] [slurm-users] --mem is not limiting the job's memory
>>>
>>>
>>>
>>> *APL external email warning: *Verify sender
>>> slurm-users-bounces at lists.schedmd.com before clicking links or
>>> attachments
>>>
>>>
>>>
>>> Running slurm 22.03.02 on Ubunutu 22.04 server.
>>>
>>> Jobs submitted with --mem=5g are able to allocate an unlimited amount of
>>> memory.
>>>
>>>
>>>
>>> how to limit on the job submission level how much memory it can grab?
>>>
>>>
>>>
>>> thanks, and best regards!
>>> Boris
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230721/18490bdd/attachment-0001.htm>
More information about the slurm-users
mailing list