[slurm-users] How do I impose a limit the memory requested by a job?
djbaker12 at gmail.com
Thu Mar 14 10:54:20 UTC 2019
Thank you for your advice. That all makes sense. We're running diskless
compute nodes and so the usable memory is less than the total memory. So I
have added a memory check to my job_submit.lua -- see below. I think that
all makes sense.
-- Check memory/node is valid
if job_desc.min_mem_per_cpu == 9223372036854775808 then
job_desc.min_mem_per_cpu = 4300
memory = job_desc.min_mem_per_cpu * job_desc.min_cpus
if memory > 172000 then
slurm.log_user("You cannot request more than 172000 Mbytes per node")
slurm.log_user("memory is: %u",memory)
On Tue, Mar 12, 2019 at 4:48 PM Paul Edmon <pedmon at cfa.harvard.edu> wrote:
> Slurm should automatically block or reject jobs that can't run on that
> partition in terms of memory usage for a single node. So you shouldn't
> need to do anything. If you need something less than the max memory per
> node then you will need to enforce some limits. We do this via a jobsubmit
> lua script. That would be my recommended method.
> -Paul Edmon-
> On 3/12/19 12:31 PM, David Baker wrote:
> I have set up a serial queue to run small jobs in the cluster. Actually, I
> route jobs to this queue using the job_submit.lua script. Any 1 node job
> using up to 20 cpus is routed to this queue, unless a user submits
> their job with an exclusive flag.
> The partition is shared and so I defined memory to be a resource. I've set
> default memory/cpu to be 4300 Mbytes. There are 40 cpus installed in the
> nodes and the usable memory is circa 17200 Mbytes -- hence my default
> The compute nodes are defined with RealMemory=190000, by the way.
> I am curious to understand how I can impose a memory limit on the jobs
> that are submitted to this partition. It doesn't make any sense to request
> more than the total usable memory on the nodes. So could anyone please
> advise me how to ensure that users cannot request more than the usable
> memory on the nodes.
> Best regards,
> PartitionName=serial nodes=red[460-464] Shared=Yes MaxCPUsPerNode=40
> DefaultTime=02:00:00 MaxTime=60:00:00 QOS=serial
> SelectTypeParameters=CR_Core_Memory *DefMemPerCPU=4300* State=UP
> AllowGroups=jfAccessToIridis5 PriorityJobFactor=10 PreemptMode=off
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users