[slurm-users] Job fails while running with Reason AssocMaxJobsLimit

Herc Silverstein herc.silverstein at schrodinger.com
Thu Jun 1 04:49:31 UTC 2023


We have a job that ran for 8 seconds, then failed with the Reason 
showing as AssocMaxJobsLimit. In our case we have MaxJobs for each user 
set to 5000.  My understanding was that if the user submitted > 5000 
jobs, slurm would only run 5000.  The other jobs would just wait.

If that's correct, then why did this job run?  And how can it have 
Reason=AssocMaxJobsLimit (as I assumed it wouldn't be allowed to run and 
then when it did it would have been under the MaxJobs limit)?

        JobID    JobName      State ExitCode      User  Partition  
Timelimit Start                 End    Elapsed     MaxRSS 
Submit                                 NodeList Reason
------------ ---------- ---------- -------- --------- ---------- 
---------- ------------------- ------------------- ---------- ---------- 
------------------- ---------------------------------------- 
55726852     P41_TS_FE+     FAILED    127:0     lwang compute-1+  
UNLIMITED 2023-05-30T22:37:27 2023-05-30T22:37:35 00:00:08            
2023-05-30T21:44:21 compute-16core-64gb-preemptible-474      
55726852.ba+      batch     FAILED 127:0                                 
2023-05-30T22:37:27 2023-05-30T22:37:35   00:00:08       956K 
2023-05-30T22:37:27 compute-16core-64gb-preemptible-474


More information about the slurm-users mailing list