[slurm-users] Debugging accounting/QOS policy errors

Stradling, Alden Reid (ars9ac) ars9ac at virginia.edu
Fri Jun 22 14:16:49 MDT 2018

I just spent another fun hour diffing out why I got the classic:

Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

I dug it out with the tried and true sacctmgr show associations and scontrol show partition and etc -- but are there better tools to get at the exact config, partition, QOS, and association contributions to why a batch submission runs or not?

I remember seeing something about SLURM 18 having a better error message, but I imagine someone in this august group has gotten tired of debugging this by hand and written something amazing to show the attributes for any combination of allocation, partition and user...

And if not, I'll just keep plugging away. :)


