[slurm-users] how to find out why a job won't run?
Daan van Rossum
d.r.vanrossum at gmx.de
Mon Nov 26 01:19:54 MST 2018
I'm also interested in this. Another example: "Reason=(ReqNodeNotAvail)" is all that a user sees in a situation when his/her job's walltime runs into a system maintenance reservation.
* on Friday, 2018-11-23 09:55 -0500, Steven Dick <kg4ydw at gmail.com> wrote:
> I'm looking for a tool that will tell me why a specific job in the
> queue is still waiting to run. squeue doesn't give enough detail. If
> the job is held up on QOS, it's pretty obvious. But if it's
> resources, it's difficult to tell.
>
> If a job is not running because of resources, how can I identify which
> resource is not available? In a few cases, I've looked at what the
> job asked for and found a node that has those resources free, but
> still can't figure out why it isn't running.
>
> Also, if there are preemptable jobs in the queue, why is the job
> waiting on resources? Is there a priority for running jobs that can
> be compared to waiting jobs?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181126/4a88f347/attachment.sig>
More information about the slurm-users
mailing list