Something odd is going on on our cluster.  User has a lot of pending jobs in a job array (a few thousand).

 

squeue -u kmnx005 -r -t PD | head -5

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

       3045324_875      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

       3045324_876      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

       3045324_877      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

       3045324_878      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

 

None are getting scheduled.  But when I ask SLURM what that job’s priority is, it produces no output:

 

$ sprio -j 3045324

          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS                 TRES

 

 

Any clues what’s going on here?

-- 

Tim Cutts

Scientific Computing Platform Lead

AstraZeneca

 

Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue |

 


AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com