[slurm-users] possible to set memory slack space before killing jobs?
Loris Bennett
loris.bennett at fu-berlin.de
Thu Dec 6 00:05:51 MST 2018
Eli V <eliventer at gmail.com> writes:
> We run our cluster using select parms CR_Core_Memory and always
> require a user to set the memory used when submitting a job to avoid
> swapping our nodes to uselessness. However, since slurmd is pretty
> vigilant about killing jobs that exceed their request we end up with
> jobs requesting more memory then needed leading to our node's CPUs
> being underutilized.
>
> What would be really nice would be if I could set a percent memory
> slack so a job wouldn't be killed until it exceeded it's requested
> memory by the given percent, essentially allowing an admin or perhaps,
> user controlled amount of memory overcommit.
>
> So, for example, a node with 256GB of RAM could run 8 jobs requesting
> 32GB of RAM currently even if they just average 28GB per job, but
> could run 9 jobs requesting 28GB of RAM allowing 15% overcommit,
> without having to worry about the occasional higher mem job being
> killed.
>
> Anyone have some thoughts/ideas about this? Seems like it should be
> relatively straightforward to implement, though of course using it
> effectively will require some tuning.
It is not clear to me that this is a good idea. I think it is important
to inform users about the memory usage of their jobs, so that they can
estimate their requirements as accurately as possible. If, as a user, I
find my job runs successfully even if I underestimate the memory needed,
there is no real incentive for me to be more accurate in future. In
fact, I may be rewarded for requesting too little RAM, since jobs
requesting fewer resources may tend to start earlier.
Cheers,
Loris
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list