[slurm-users] Nodes stuck in drain state

Roger Mason rmason at mun.ca
Thu May 25 14:27:42 UTC 2023


Davide DelVento <davide.quantum at gmail.com> writes:

> Can you ssh into the node and check the actual availability of memory?
> Maybe there is a zombie process (or a healthy one with a memory leak
> bug) that's hogging all the memory?

This is what top shows:

last pid: 45688;  load averages:  0.00,  0.00,  0.00                                                                                   up 0+03:56:52  11:58:13
26 processes:  1 running, 25 sleeping
CPU:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9% idle
Mem: 9452K Active, 69M Inact, 290M Wired, 287K Buf, 5524M Free
ARC: 125M Total, 37M MFU, 84M MRU, 168K Anon, 825K Header, 3476K Other
     36M Compressed, 89M Uncompressed, 2.46:1 Ratio
Swap: 10G Total, 10G Free

Thanks for the suggestion.


