[slurm-users] Job fails while running with Reason AssocMaxJobsLimit
Herc Silverstein
herc.silverstein at schrodinger.com
Thu Jun 1 04:49:31 UTC 2023
Hi,
We have a job that ran for 8 seconds, then failed with the Reason
showing as AssocMaxJobsLimit. In our case we have MaxJobs for each user
set to 5000. My understanding was that if the user submitted > 5000
jobs, slurm would only run 5000. The other jobs would just wait.
If that's correct, then why did this job run? And how can it have
Reason=AssocMaxJobsLimit (as I assumed it wouldn't be allowed to run and
then when it did it would have been under the MaxJobs limit)?
JobID JobName State ExitCode User Partition
Timelimit Start End Elapsed MaxRSS
Submit NodeList Reason
------------ ---------- ---------- -------- --------- ----------
---------- ------------------- ------------------- ---------- ----------
------------------- ----------------------------------------
----------------------
55726852 P41_TS_FE+ FAILED 127:0 lwang compute-1+
UNLIMITED 2023-05-30T22:37:27 2023-05-30T22:37:35 00:00:08
2023-05-30T21:44:21 compute-16core-64gb-preemptible-474
AssocMaxJobsLimit
55726852.ba+ batch FAILED 127:0
2023-05-30T22:37:27 2023-05-30T22:37:35 00:00:08 956K
2023-05-30T22:37:27 compute-16core-64gb-preemptible-474
Herc
More information about the slurm-users
mailing list