Something odd is going on on our cluster. User has a lot of pending jobs in a job array (a few thousand).
squeue -u kmnx005 -r -t PD | head -5 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3045324_875 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit) 3045324_876 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit) 3045324_877 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit) 3045324_878 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit)
None are getting scheduled. But when I ask SLURM what that job’s priority is, it produces no output:
$ sprio -j 3045324 JOBID PARTITION PRIORITY SITE AGE FAIRSHARE JOBSIZE PARTITION QOS TRES
Any clues what’s going on here? -- Tim Cutts Scientific Computing Platform Lead AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Cataloguehttps://azcollaboration.sharepoint.com/sites/CMU993 |
________________________________
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com