[slurm-users] Upgrading SLURM from 18 to 20.11.9

Wadud Miah W.Miah at soton.ac.uk
Thu Sep 8 16:14:34 UTC 2022


The previous version was 18 and now I am trying to upgrade to 20, so I am well within 2 major versions.

Regards,
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Paul Edmon <pedmon at cfa.harvard.edu>
Sent: Thursday, September 8, 2022 4:44:36 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Upgrading SLURM from 18 to 20.11.9

CAUTION: This e-mail originated outside the University of Southampton.

Typically slurm only supports upgrading between 2 major versions ahead.  If you are on 18.08 you likely can only go to 20.02. Then after you upgrade to 20.02 you can go to 20.11 or 21.08.


-Paul Edmon-


On 9/8/22 11:38 AM, Wadud Miah wrote:
hi Mick,

I have checked that all the compute nodes and controllers all have the same version of SLURM (20.11.9). I am indeed trying to upgrade SlurmDB first, and am getting the errors in the slurmdbd.log:

[2022-09-08T15:45:11.115] slurmdbd version 20.11.9 started
[2022-09-08T15:45:23.001] error: unpack_header: protocol_version 8448 not supported
[2022-09-08T15:33:57.001] unpacking header
[2022-09-08T15:33:57.001] error: destroy_forward: no init
[2022-09-08T15:33:57.001] error: slurm_unpack_received_msg: Message receive failure
[2022-09-08T15:33:57.011] error: CONN:11 Failed to unpack SLURM_PERSIST_INIT message

Regards,
Wadud.

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com><mailto:slurm-users-bounces at lists.schedmd.com> on behalf of Timony, Mick <Michael_Timony at hms.harvard.edu><mailto:Michael_Timony at hms.harvard.edu>
Sent: 08 September 2022 16:24
To: Slurm User Community List <slurm-users at lists.schedmd.com><mailto:slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Upgrading SLURM from 18 to 20.11.9

CAUTION: This e-mail originated outside the University of Southampton.
This thread on the forums may help:

https://groups.google.com/g/slurm-users/c/YB55Ru9rvD4<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fslurm-users%2Fc%2FYB55Ru9rvD4&data=05%7C01%7Cw.miah%40soton.ac.uk%7Cfd25248a7e6a4fa729d308da91b20c1a%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637982491141437024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=THo3JUObIzF6EWcIlQ1OsJwUwxEAGUFeMdLuvlEKhzA%3D&reserved=0>


It looks like you have something on your network with an older version of slurm installed. I'd check the Slurm version installed on your compute nodes and controllers.

The recommended approach to upgrading is to upgrade the SlurmDB first, then the controllers, then the compute nodes. More info here:

https://slurm.schedmd.com/quickstart_admin.html#upgrade<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html%23upgrade&data=05%7C01%7Cw.miah%40soton.ac.uk%7Cfd25248a7e6a4fa729d308da91b20c1a%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637982491141437024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Evu1PdyvAeinb0W11Ia6NxUgOvfaITmJiVau8nak%2Fac%3D&reserved=0>

Regards
--
Mick Timony
Senior DevOps Engineer
Harvard Medical School
--

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com><mailto:slurm-users-bounces at lists.schedmd.com> on behalf of Wadud Miah <W.Miah at soton.ac.uk><mailto:W.Miah at soton.ac.uk>
Sent: Thursday, September 8, 2022 10:47 AM
To: slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com> <slurm-users at lists.schedmd.com><mailto:slurm-users at lists.schedmd.com>
Subject: [slurm-users] Upgrading SLURM from 18 to 20.11.9

Hi,

I am attempting to upgrade from SLURM 18 to 20.11.9 and when I attempt to start slurmdbd (version 20.11.9), I get the following error messages in /var/log/slurm/slurmdbd.log:

[2022-09-08T15:45:11.115] slurmdbd version 20.11.9 started
[2022-09-08T15:45:23.001] error: unpack_header: protocol_version 8448 not supported
[2022-09-08T15:33:57.001] unpacking header
[2022-09-08T15:33:57.001] error: destroy_forward: no init
[2022-09-08T15:33:57.001] error: slurm_unpack_received_msg: Message receive failure
[2022-09-08T15:33:57.011] error: CONN:11 Failed to unpack SLURM_PERSIST_INIT message

Any help will be greatly appreciated.

Regards,

----------
Wadud Miah
Research Computing Support
University of Southampton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220908/faf51e24/attachment-0001.htm>


More information about the slurm-users mailing list