[slurm-users] Segfault in slurmctld 22.05

EPF (Esben Peter Friis) EPF at novozymes.com
Thu Jun 2 06:16:01 UTC 2022


Hi all

We installed Slurm 22.05 yesterday, and the slurmctld deamon crashes randomly every couply of hours.  It's not much information I can get out of it, but running slurmctld in foreground (slurmctld -D -vvvvv) does not reveal anything before the crash.



Possibly unrelated errors reported by slurmctld:

slurmctld: error: slurm_unpack_received_msg: [[localhost]:43924] We need to forward this to other nodes use slurm_receive_msg_and_forward instead
slurmctld: error: auth_g_unpack: authentication plugin unknown(1297436231) not found
slurmctld: error: slurm_unpack_received_msg: [[localhost]:43924] auth_g_unpack: 0 has authentication error: No error
slurmctld: error: slurm_unpack_received_msg: [[localhost]:43924] Header lengths are longer than data received
slurmctld: error: slurm_receive_msg [127.0.0.1:43924]: Unspecified error


dmesg shows:

[87848.891824] sched_agent[163673]: segfault at 12 ip 00007fabd0f1986b sp 00007fabca1843b0 error 4
[87848.891827] Code: 89 f9 48 85 d2 74 07 44 89 e9 48 0f af ca 0f b7 c0 0f af c1 39 c3 0f 4c d8 4c 89 f7 e8 9e 5c fe ff 48 85 c0 74 31 48 8b 50 08 <0f> b7 42 12 66 85 c0 75 09 0f b7 42 42 66 85 c0 74 dd 48 8b 4a 20
[92966.374524] bckfl[29148]: segfault at f0b73934 ip 00007f4d9a95a867 sp 00007f4d981ab6c0 error 4 in select_cons_tres.so[7f4d9a93b000+2d000]
[92966.374538] Code: 8b 52 30 4c 89 f9 48 85 d2 74 07 44 89 e9 48 0f af ca 0f b7 c0 0f af c1 39 c3 0f 4c d8 4c 89 f7 e8 9e 5c fe ff 48 85 c0 74 31 <48> 8b 50 08 0f b7 42 12 66 85 c0 75 09 0f b7 42 42 66 85 c0 74 dd
[103685.791492] sched_agent[131341]: segfault at 12 ip 00007f69d3df186b sp 00007f69d112e3b0 error 4 in select_cons_tres.so[7f69d3dd2000+2d000]
[103685.791505] Code: 89 f9 48 85 d2 74 07 44 89 e9 48 0f af ca 0f b7 c0 0f af c1 39 c3 0f 4c d8 4c 89 f7 e8 9e 5c fe ff 48 85 c0 74 31 48 8b 50 08 <0f> b7 42 12 66 85 c0 75 09 0f b7 42 42 66 85 c0 74 dd 48 8b 4a 20







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220602/d7533238/attachment.htm>


More information about the slurm-users mailing list