[slurm-users] how to find out why a job won't run?

Daan van Rossum d.r.vanrossum at gmx.de
Mon Nov 26 01:19:54 MST 2018


I'm also interested in this.  Another example: "Reason=(ReqNodeNotAvail)" is all that a user sees in a situation when his/her job's walltime runs into a system maintenance reservation.

* on Friday, 2018-11-23 09:55 -0500, Steven Dick <kg4ydw at gmail.com> wrote:

> I'm looking for a tool that will tell me why a specific job in the
> queue is still waiting to run.  squeue doesn't give enough detail.  If
> the job is held up on QOS, it's pretty obvious.  But if it's
> resources, it's difficult to tell.
> 
> If a job is not running because of resources, how can I identify which
> resource is not available?  In a few cases, I've looked at what the
> job asked for and found a node that has those resources free, but
> still can't figure out why it isn't running.
> 
> Also, if there are preemptable jobs in the queue, why is the job
> waiting on resources?  Is there a priority for running jobs that can
> be compared to waiting jobs?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181126/4a88f347/attachment.sig>


More information about the slurm-users mailing list