[slurm-users] How do I impose a limit the memory requested by a job?

Doug Meyer dameyer99 at gmail.com
Thu Mar 14 12:36:50 UTC 2019


We also run diskless.  In the slurm.conf we round down on memory so slurm
does not have the total budget to work with and use a default memory per
job value reflecting declared memory/# of threads per node. If users don't
declarememory limit we are fine. If they declare more we are fine too.
Mostly.  We had to turn off memory enforcement as the job memory usage is
very uneven during runtime but with the above we have seldom had problems.

Doug

On Thu, Mar 14, 2019 at 3:57 AM david baker <djbaker12 at gmail.com> wrote:

> Hello Paul,
>
> Thank you for your advice. That all makes sense. We're running diskless
> compute nodes and so the usable memory is less than the total memory. So I
> have added a memory check to my job_submit.lua -- see below. I think that
> all makes sense.
>
> Best regards,
> David
>
> -- Check memory/node is valid
>     if job_desc.min_mem_per_cpu == 9223372036854775808 then
>       job_desc.min_mem_per_cpu = 4300
>     end
>
>     memory = job_desc.min_mem_per_cpu * job_desc.min_cpus
>
>     if memory > 172000 then
>       slurm.log_user("You cannot request more than 172000 Mbytes per node")
>       slurm.log_user("memory is: %u",memory)
>       return slurm.ERROR
>     end
>
>
> On Tue, Mar 12, 2019 at 4:48 PM Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>
>> Slurm should automatically block or reject jobs that can't run on that
>> partition in terms of memory usage for a single node.  So you shouldn't
>> need to do anything.  If you need something less than the max memory per
>> node then you will need to enforce some limits.  We do this via a jobsubmit
>> lua script.  That would be my recommended method.
>>
>>
>> -Paul Edmon-
>>
>>
>> On 3/12/19 12:31 PM, David Baker wrote:
>>
>> Hello,
>>
>>
>> I have set up a serial queue to run small jobs in the cluster. Actually,
>> I route jobs to this queue using the job_submit.lua script. Any 1 node job
>> using up to 20 cpus is routed to this queue, unless a user submits
>> their job with an exclusive flag.
>>
>>
>> The partition is shared and so I defined memory to be a resource. I've
>> set default memory/cpu to be 4300 Mbytes. There are 40 cpus installed in
>> the nodes and the usable memory is circa 17200 Mbytes -- hence my default
>> mem/cpu.
>>
>>
>> The compute nodes are defined with RealMemory=190000, by the way.
>>
>>
>> I am curious to understand how I can impose a memory limit on the jobs
>> that are submitted to this partition. It doesn't make any sense to request
>> more than the total usable memory on the nodes. So could anyone please
>> advise me how to ensure that users cannot request more than the usable
>> memory on the nodes.
>>
>>
>> Best regards,
>>
>> David
>>
>>
>> PartitionName=serial nodes=red[460-464] Shared=Yes MaxCPUsPerNode=40
>> DefaultTime=02:00:00 MaxTime=60:00:00 QOS=serial
>> SelectTypeParameters=CR_Core_Memory *DefMemPerCPU=4300* State=UP
>> AllowGroups=jfAccessToIridis5 PriorityJobFactor=10 PreemptMode=off
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190314/be8f0fa6/attachment.html>


More information about the slurm-users mailing list