[slurm-users] Nodes stuck in drain state
Brian Andrus
toomuchit at gmail.com
Thu May 25 14:54:02 UTC 2023
That output of slurmd -C is your answer.
Slurmd only sees 6GB of memory and you are claiming it has 10GB.
I would run some memtests, look at meminfo on the node, etc.
Maybe even check that the type/size of memory in there is what you think
it is.
Brian Andrus
On 5/25/2023 7:30 AM, Roger Mason wrote:
> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
>
>> 1. Is slurmd running on the node?
> Yes.
>
>> 2. What's the output of "slurmd -C" on the node?
> NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2
> ThreadsPerCore=1 RealMemory=6097
>
>> 3. Define State=UP in slurm.conf in stead of UNKNOWN
> Will do.
>
>> 4. Why have you configured TmpDisk=0? It should be the size of the
>> /tmp filesystem.
> I have not configured TmpDisk. This the entry in slurm.conf for that
> node:
> NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2
> ThreadsPerCore=1 RealMemory=10193 State=UNKNOWN
>
> But I do notice that slurmd -C now says there is less memory than
> configured.
>
> Thanks again.
>
> Roger
>
More information about the slurm-users
mailing list