[slurm-users] Nodes stuck in drain state

Brian Andrus toomuchit at gmail.com
Thu May 25 14:54:02 UTC 2023


That output of slurmd -C is your answer.

Slurmd only sees 6GB of memory and you are claiming it has 10GB.

I would run some memtests, look at meminfo on the node, etc.

Maybe even check that the type/size of memory in there is what you think 
it is.

Brian Andrus

On 5/25/2023 7:30 AM, Roger Mason wrote:
> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
>
>> 1. Is slurmd running on the node?
> Yes.
>
>> 2. What's the output of "slurmd -C" on the node?
> NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2
> ThreadsPerCore=1 RealMemory=6097
>
>> 3. Define State=UP in slurm.conf in stead of UNKNOWN
> Will do.
>
>> 4. Why have you configured TmpDisk=0?  It should be the size of the
>> /tmp filesystem.
> I have not configured TmpDisk.  This the entry in slurm.conf for that
> node:
> NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2
> ThreadsPerCore=1 RealMemory=10193  State=UNKNOWN
>
> But I do notice that slurmd -C now says there is less memory than
> configured.
>
> Thanks again.
>
> Roger
>



More information about the slurm-users mailing list