[slurm-users] possible to set memory slack space before killing jobs?

Loris Bennett loris.bennett at fu-berlin.de
Thu Dec 6 00:05:51 MST 2018


Eli V <eliventer at gmail.com> writes:

> We run our cluster using select parms CR_Core_Memory and always
> require a user to set the memory used when submitting a job to avoid
> swapping our nodes to uselessness. However, since slurmd is pretty
> vigilant about killing jobs that exceed their request we end up with
> jobs requesting more memory then needed leading to our node's CPUs
> being underutilized.
>
> What would be really nice would be if I could set a percent memory
> slack so a job wouldn't be killed until it exceeded it's requested
> memory by the given percent, essentially allowing an admin or perhaps,
> user controlled amount of memory overcommit.
>
> So, for example, a node with 256GB of RAM could run 8 jobs requesting
> 32GB of RAM currently even if they just average 28GB per job, but
> could run 9 jobs requesting 28GB of RAM allowing 15% overcommit,
> without having to worry about the occasional higher mem job being
> killed.
>
> Anyone have some thoughts/ideas about this? Seems like it should be
> relatively straightforward to implement, though of course using it
> effectively will require some tuning.

It is not clear to me that this is a good idea.  I think it is important
to inform users about the memory usage of their jobs, so that they can
estimate their requirements as accurately as possible.  If, as a user, I
find my job runs successfully even if I underestimate the memory needed,
there is no real incentive for me to be more accurate in future.  In
fact, I may be rewarded for requesting too little RAM, since jobs
requesting fewer resources may tend to start earlier.  

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list