[slurm-users] Jobs waiting while plenty of cpu and memory available

Wed Jul 10 07:57:22 UTC 2019

Hi,

> So here's something funny. One user submitted a job that requested 60 cpu's and 400000M of memory. Our largest nodes in that partition have 72 cpu's and 256G of memory. So when a user requests 400G of ram, what would be good behavior? I would like to see slurm reject the job, "job is impossible to run." Instead, slurm keeps slowly growing the priority of that job (because of fairshare) and the job effectively disables the nodes that are trying to free up memory for it. (All the nodes that have enough cpu's). Not just one node that has enough cpu's. This is a combination of *multiple* bad behaviors. Stopping a node that can never satisfy the request is bad... Stopping *all* nodes that have enough cpu's, even though none of them can ever satisfy the request is extra bad.

EnforcePartLimits=YES

would be your friend :)

We also use a submission filter that checks for requests and adjust some
things if needed :)

Regards,
-- Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190710/cdbdb3b4/attachment.sig>