Hi Tim,
On 10/7/24 11:13, Cutts, Tim via slurm-users wrote:
Something odd is going on on our cluster. User has a lot of pending jobs in a job array (a few thousand).
squeue -u kmnx005 -r -t PD | head -5
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3045324_875 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit)
3045324_876 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit)
3045324_877 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit)
3045324_878 core run_scp_ kmnx005 PD 0:00 1 (JobArrayTaskLimit)
None are getting scheduled. But when I ask SLURM what that job’s priority is, it produces no output:
$ sprio -j 3045324
JOBID PARTITION PRIORITY SITE AGE FAIRSHARE JOBSIZE PARTITION QOS TRES
Any clues what’s going on here?
What array limits do you have in slurm.conf? For example:
$ scontrol show config | grep -i array MaxArraySize = 1001
/Ole