[slurm-users] Is this a known error?

Sean McGrath smcgrat at tchpc.tcd.ie
Tue Dec 7 17:20:00 UTC 2021


Hi,

I'm seeing something similar.

slurmdbd version is 21.08.4

All the slurmd's & slurmctld's are version 20.11.8

This is what is in the slurmdbd.log

[2021-12-07T17:16:50.001] error: unpack_header: protocol_version 8704 not supported
[2021-12-07T17:16:50.001] error: unpacking header
[2021-12-07T17:16:50.001] error: destroy_forward: no init
[2021-12-07T17:16:50.001] error: slurm_unpack_received_msg: Message receive failure
[2021-12-07T17:16:50.011] error: CONN:17 Failed to unpack SLURM_PERSIST_INIT message
[2021-12-07T17:17:09.001] error: unpack_header: protocol_version 8704 not supported
[2021-12-07T17:17:09.001] error: unpacking header
[2021-12-07T17:17:09.001] error: destroy_forward: no init
[2021-12-07T17:17:09.001] error: slurm_unpack_received_msg: Message receive failure
[2021-12-07T17:17:09.011] error: CONN:35 Failed to unpack SLURM_PERSIST_INIT message

I've looked through our clusters but don't see any that aren't 20.11.8.
 
Can anyone advise how to identify the clients that are generating those
errors please?

Thanks

Sean


On Fri, Sep 17, 2021 at 03:20:42PM +0200, Andreas Davour wrote:

> On 2021-09-17 11:54, Bjørn-Helge Mevik wrote:
> >Andreas Davour <andreas.davour at conoa.se> writes:
> >
> >>[2021-09-17T08:53:49.166] error: unpack_header: protocol_version 8448
> >>not supported
> >>[2021-09-17T08:53:49.166] error: unpacking header
> >>[2021-09-17T08:53:49.166] error: destroy_forward: no init
> >>[2021-09-17T08:53:49.166] error: slurm_receive_msg_and_forward:
> >>Message receive failure
> >>[2021-09-17T08:53:49.176] error: service_connection:
> >>slurm_receive_msg: Message receive failure
> >>
> >>Anyone seen that before, or immediately see that I did something wrong?
> >
> >Sounds a lot like you have a different version of Slurm installed on some
> >compute node(s).
> 
> That's the kind of impressions I was hoping for.
> 
> Yeah, I thought that as well but I can not find any packages
> differing and as far as I know they have all been restarted.
> 
> I'll see if there is anything like a version mismatch somewhere.
> 
> /andreas
> 
> 
> 
> 

-- 
Sean McGrath M.Sc

Systems Administrator
Trinity Centre for High Performance and Research Computing
Trinity College Dublin

sean.mcgrath at tchpc.tcd.ie

https://www.tcd.ie/
https://www.tchpc.tcd.ie/

+353 (0) 1 896 3725




More information about the slurm-users mailing list