Hello,
Thanks again for your documentation, I deployed 24.05.2 last week. But this weekend slurmctld crashed with only the following in the logs:
"Aug 25 15:33:02 slurmadmin slurmctld[79950]: free(): invalid next size (fast)"
Also, I regularly get these messages in my logs even though these two machines are in the same subnet in VMs, and the slurmadmin machine is the same machine that runs slurmctld and slurmd, so it cannot lose itself. Meanwhile, all my compute nodes are never disconnected. /var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:slurmadmin RPC:REQUEST_PING : Communication connection failure /var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:vmjupyter RPC:REQUEST_PING : Communication connection failure /var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:vmdev RPC:REQUEST_PING : Communication connection failure
Should I open a new topic for this?
Thank you in advance.