[slurm-users] Memory oversubscription and sheduling

Cory Holcomb cory.holcomb at broadcom.com
Mon May 7 07:58:38 MDT 2018


Thank you, for the reply  I was beginning to wonder if my message was seen.

While I understand how batch systems work, if you have a system daemon that
develops a memory leak and consumes the memory outside of allocation.

Not checking the used memory on the box before dispatch seems like a good
way to black hole a bunch of jobs.

On Sat, May 5, 2018 at 7:21 AM, Chris Samuel <chris at csamuel.org> wrote:

> On Thursday, 26 April 2018 3:28:19 AM AEST Cory Holcomb wrote:
>
> > It appears that I have a configuration that only takes into account the
> > allocated memory before dispatching.
>
> With batch systems the idea is for the users to set constraints for their
> jobs
> so the scheduler can backfill other jobs onto nodes knowing how much
> memory
> they can rely on.
>
> So really the emphasis is on making the users set good resource requests
> (such
> as memory) and for the batch system to terminate jobs that exceed them (or
> to
> arrange for the kernel to constrain them to that amount via cgroups).
>
> Hope that helps!
> Chris
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180507/fb570eaa/attachment-0001.html>


More information about the slurm-users mailing list