Jobs not getting scheduled, no priority calculation, but still in queue? - slurm-users

7 Oct 2024


      Something odd is going on on our cluster.  User has a lot of pending jobs in a job array (a few thousand).
squeue -u kmnx005 -r -t PD | head -5
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       3045324_875      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)
       3045324_876      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)
       3045324_877      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)
       3045324_878      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)
None are getting scheduled.  But when I ask SLURM what that job’s priority is, it produces no output:
$ sprio -j 3045324
          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS                 TRES
Any clues what’s going on here?
--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Cataloguehttps://azcollaboration.sharepoint.com/sites/CMU993 |
________________________________
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com