[slurm-users] High log rate on messages like "Node nodeXX has low real_memory size"

Per Lönnborg perlon at passagen.se
Thu May 12 11:35:30 UTC 2022


Greetings,


is there a way to lower the log rate on error messages in slurmctld for nodes with hardware errors? 


We see for example this for a node that has DIMM errors:



[2022-05-12T07:07:34.757] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:35.760] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:36.763] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:37.766] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:38.769] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:39.773] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:40.776] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:41.779] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:42.781] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:45.143] error: Node node37 has low real_memory size (257642 < 257660)


The log warning is correct, the node has DIMM errors, but that´s one log entry per second. That doesn´t seem right with such high log rate?


Thanks,
/ Per Lonnborg
_______________________________________________________________
Annons: Handla enkelt och smidigt hos Clas Ohlson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220512/53325d54/attachment.htm>


More information about the slurm-users mailing list