[slurm-users] Large job starvation on cloud cluster
Chris Samuel
chris at csamuel.org
Thu Feb 28 15:51:49 UTC 2019
On 28/2/19 7:29 am, Michael Gutteridge wrote:
> 2221670 largenode sleeper. me PD N/A 1
> (null) (AssocGrpCpuLimit)
That says the job exceeds some policy limit you have set and so is not
permitted to start, looks like you've got a limit on the number of cores
that an association has in the hierarchy either at or above that level
that this would exceed.
You'll probably need to go poking around with sacctmgr to see what that
limit might be.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list