[slurm-users] how to find out why a job won't run?

Steven Dick kg4ydw at gmail.com
Fri Nov 23 07:55:00 MST 2018


I'm looking for a tool that will tell me why a specific job in the
queue is still waiting to run.  squeue doesn't give enough detail.  If
the job is held up on QOS, it's pretty obvious.  But if it's
resources, it's difficult to tell.

If a job is not running because of resources, how can I identify which
resource is not available?  In a few cases, I've looked at what the
job asked for and found a node that has those resources free, but
still can't figure out why it isn't running.

Also, if there are preemptable jobs in the queue, why is the job
waiting on resources?  Is there a priority for running jobs that can
be compared to waiting jobs?



More information about the slurm-users mailing list