[slurm-users] Nodes stuck in drain state

Davide DelVento davide.quantum at gmail.com
Thu May 25 13:39:03 UTC 2023


Can you ssh into the node and check the actual availability of memory?
Maybe there is a zombie process (or a healthy one with a memory leak bug)
that's hogging all the memory?

On Thu, May 25, 2023 at 7:31 AM Roger Mason <rmason at mun.ca> wrote:

> Hello,
>
> Doug Meyer <dameyer99 at gmail.com> writes:
>
> > Could also review the node log in /varlog/slurm/ .  Often sinfo -lR will
> tell you the cause, fro example mem not matching the config.
> >
> REASON               USER         TIMESTAMP           STATE  NODELIST
> Low RealMemory       slurm(468)   2023-05-25T09:26:59 drain* node012
> Not responding       slurm(468)   2023-05-25T09:30:31 down*
> node[001-003,008]
>
> But, as I sail in my response to Ole, the memory in slurm.conf and in
> the 'show node' output match.
>
> Many thanks for the help.
>
> Roger
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230525/d8d2988d/attachment-0001.htm>


More information about the slurm-users mailing list