[slurm-users] Large job starvation on cloud cluster

Chris Samuel chris at csamuel.org
Thu Feb 28 15:51:49 UTC 2019


On 28/2/19 7:29 am, Michael Gutteridge wrote:

> 2221670 largenode sleeper.       me PD                 N/A      1 
> (null)               (AssocGrpCpuLimit)

That says the job exceeds some policy limit you have set and so is not 
permitted to start, looks like you've got a limit on the number of cores 
that an association has in the hierarchy either at or above that level 
that this would exceed.

You'll probably need to go poking around with sacctmgr to see what that 
limit might be.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list