[slurm-users] [EXT] --mem is not limiting the job's memory
Shunran Zhang
szhang at ngs.gen-info.osaka-u.ac.jp
Fri Jun 23 07:18:38 UTC 2023
Hi
Would you mind to check your job scheduling settings in slurm.conf ?
Namely *SelectTypeParameters = **CR_CPU_Memory *or the like.
Also, you may want to use systemd-cgtop to at least confirm jobs are indeed
running in cgroups.
Sincerely,
S. Zhang
On Fri, Jun 23, 2023, 12:07 Boris Yazlovitsky <borisyaz at gmail.com> wrote:
> it's still not constraining memory...
>
> a memhog job continues to memhog:
>
> boris at rod:~/scripts$ sacct --starttime=2023-05-01
> --format=jobid,user,start,elapsed,reqmem,maxrss,maxvmsize,nodelist,state,exit
> -j 199
> JobID User Start Elapsed ReqMem
> MaxRSS MaxVMSize NodeList State ExitCode
> ------------ --------- ------------------- ---------- ----------
> ---------- ---------- --------------- ---------- --------
> 199 boris 2023-06-23T02:42:30 00:01:21 1M
> milhouse COMPLETED 0:0
> 199.batch 2023-06-23T02:42:30 00:01:21
> 104857988K 104858064K milhouse COMPLETED 0:0
>
> One thing I noticed is that the machines I'm working on do not have
> libcgroup and libcgroup-dev installed - but slurm does have its own cgroup
> implementation? the slurmd processes do utilize /usr/lib/slurm/*cgroup.so
> objects. I will try to recompile slurm with those cgrouplib packages
> present.
>
> On Thu, Jun 22, 2023 at 6:04 PM Ozeryan, Vladimir <
> Vladimir.Ozeryan at jhuapl.edu> wrote:
>
>> No worries,
>>
>> No, we don’t have any OS level settings, only “allowed_devices.conf”
>> which just has /dev/random, /dev/tty and stuff like that.
>>
>>
>>
>> But I think this could be the culprit, check out man page for cgroup.conf
>> AllowedRAMSpace=100
>>
>>
>>
>> I would just leave these four:
>>
>> CgroupAutomount=yes
>> ConstrainCores=yes
>> ConstrainDevices=yes
>> ConstrainRAMSpace=yes
>>
>>
>>
>> Vlad.
>>
>>
>>
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf
>> Of *Boris Yazlovitsky
>> *Sent:* Thursday, June 22, 2023 5:40 PM
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's memory
>>
>>
>>
>> *APL external email warning: *Verify sender
>> slurm-users-bounces at lists.schedmd.com before clicking links or
>> attachments
>>
>>
>>
>> thank you Vlad - looks like we have the same yes's
>>
>> Do you remember if you had to make any settings on the OS level or in the
>> kernel to make it work?
>>
>>
>>
>> -b
>>
>>
>>
>> On Thu, Jun 22, 2023 at 5:31 PM Ozeryan, Vladimir <
>> Vladimir.Ozeryan at jhuapl.edu> wrote:
>>
>> Hello,
>>
>>
>>
>> We have the following configured and it seems to be working ok.
>>
>>
>>
>> CgroupAutomount=yes
>> ConstrainCores=yes
>> ConstrainDevices=yes
>> ConstrainRAMSpace=yes
>>
>> Vlad.
>>
>>
>>
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf
>> Of *Boris Yazlovitsky
>> *Sent:* Thursday, June 22, 2023 4:50 PM
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's memory
>>
>>
>>
>> *APL external email warning: *Verify sender
>> slurm-users-bounces at lists.schedmd.com before clicking links or
>> attachments
>>
>>
>>
>> Hello Vladimir, thank you for your response.
>>
>>
>>
>> this is the cgroups.conf file:
>>
>> CgroupAutomount=yes
>> ConstrainCores=yes
>> ConstrainDevices=yes
>> ConstrainRAMSpace=yes
>> ConstrainSwapSpace=yes
>> MaxRAMPercent=90
>> AllowedSwapSpace=0
>> AllowedRAMSpace=100
>> MemorySwappiness=0
>> MaxSwapPercent=0
>>
>>
>>
>> /etc/default/grub:
>>
>> GRUB_DEFAULT=0
>> GRUB_TIMEOUT_STYLE=hidden
>> GRUB_TIMEOUT=0
>> GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
>> GRUB_CMDLINE_LINUX_DEFAULT=""
>> GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cgroup_enable=memory
>> swapaccount=1"
>>
>>
>>
>> what other cgroup settings need to be set?
>>
>>
>>
>> && thank you!
>>
>> -b
>>
>>
>>
>> On Thu, Jun 22, 2023 at 4:02 PM Ozeryan, Vladimir <
>> Vladimir.Ozeryan at jhuapl.edu> wrote:
>>
>> --mem=5G. Should allocate 5G of memory per node.
>>
>> Are your cgroups configured?
>>
>>
>>
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf
>> Of *Boris Yazlovitsky
>> *Sent:* Thursday, June 22, 2023 3:28 PM
>> *To:* slurm-users at lists.schedmd.com
>> *Subject:* [EXT] [slurm-users] --mem is not limiting the job's memory
>>
>>
>>
>> *APL external email warning: *Verify sender
>> slurm-users-bounces at lists.schedmd.com before clicking links or
>> attachments
>>
>>
>>
>> Running slurm 22.03.02 on Ubunutu 22.04 server.
>>
>> Jobs submitted with --mem=5g are able to allocate an unlimited amount of
>> memory.
>>
>>
>>
>> how to limit on the job submission level how much memory it can grab?
>>
>>
>>
>> thanks, and best regards!
>> Boris
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230623/9a0ac6d6/attachment.htm>
More information about the slurm-users
mailing list