[slurm-users] ReqNodeNotAvail, but none of nodes in partition are listed.
Ryan Novosielski
novosirj at rutgers.edu
Mon May 7 15:11:41 MDT 2018
In my experience, it may say that even if it has nothing to do with the reason the job isn’t running, if there are nodes on the system that aren’t available.
I assume you’ve checked for reservations?
> On May 7, 2018, at 5:06 PM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>
> Dear Slurm Users,
>
> On my cluster, I have several partitions, each with their own QOS, time limits, etc.
>
> Several times today, I've received complaints from users that they submitted jobs to a partition with available nodes, but jobs are stuck in the PD state. I have spent the majority of my day investigating this, but haven't turned up anything meaningful. Both jobs show the "ReqNodeNotAvail" reason, but none of the nodes listed at not available are even in the partition these jobs are submitted to. Neither job has requested a specific node, either.
>
> I have checked slurmctld.log on the server, and have not been able to find any clues. Any where else I should look? Any ideas what could be causing this?
--
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 236 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180507/d69119f1/attachment.sig>
More information about the slurm-users
mailing list