[slurm-users] Nodes stuck in drain state
Davide DelVento
davide.quantum at gmail.com
Thu May 25 13:39:03 UTC 2023
Can you ssh into the node and check the actual availability of memory?
Maybe there is a zombie process (or a healthy one with a memory leak bug)
that's hogging all the memory?
On Thu, May 25, 2023 at 7:31 AM Roger Mason <rmason at mun.ca> wrote:
> Hello,
>
> Doug Meyer <dameyer99 at gmail.com> writes:
>
> > Could also review the node log in /varlog/slurm/ . Often sinfo -lR will
> tell you the cause, fro example mem not matching the config.
> >
> REASON USER TIMESTAMP STATE NODELIST
> Low RealMemory slurm(468) 2023-05-25T09:26:59 drain* node012
> Not responding slurm(468) 2023-05-25T09:30:31 down*
> node[001-003,008]
>
> But, as I sail in my response to Ole, the memory in slurm.conf and in
> the 'show node' output match.
>
> Many thanks for the help.
>
> Roger
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230525/d8d2988d/attachment-0001.htm>
More information about the slurm-users
mailing list