[slurm-users] Upgrading SLURM from 18 to 20.11.9
Timony, Mick
Michael_Timony at hms.harvard.edu
Thu Sep 8 16:25:52 UTC 2022
Perhaps there is a node that you don't have access to that's trying to access your slurmdDB. That's the scenario that occurred in the forum posting I linked to previously:
https://groups.google.com/g/slurm-users/c/YB55Ru9rvD4<https://urldefense.proofpoint.com/v2/url?u=https-3A__eur03.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgroups.google.com-252Fg-252Fslurm-2Dusers-252Fc-252FYB55Ru9rvD4-26data-3D05-257C01-257Cw.miah-2540soton.ac.uk-257C13f4b2b736764041dc9d08da91af4672-257C4a5378f929f44d3ebe89669d03ada9d8-257C0-257C0-257C637982479244856364-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-257C-257C-257C-26sdata-3DcQGagihxp-252BD2JTZZY-252BMKVH5I-252B386oZIXbCZT9eyfTlg-253D-26reserved-3D0&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=YD5IvPVS42d8-mPHmPwJWday89-bLJVcIhOMV2h0RF19Z6VRQvMRwFxEojs-cw5m&s=zanl-ryBDYiUX9HUz8yAQSXuRJ2IRdJU4QxUNReMkGU&e=>
You can try running netstat or other tools to see what IP's are connecting, or turn on debug logging for the slurmdb:
https://slurm.schedmd.com/slurmdbd.conf.html#OPT_DebugFlags
If everything else is running correctly, you could ignore the error.
--Mick
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Wadud Miah <W.Miah at soton.ac.uk>
Sent: Thursday, September 8, 2022 11:38 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Upgrading SLURM from 18 to 20.11.9
hi Mick,
I have checked that all the compute nodes and controllers all have the same version of SLURM (20.11.9). I am indeed trying to upgrade SlurmDB first, and am getting the errors in the slurmdbd.log:
[2022-09-08T15:45:11.115] slurmdbd version 20.11.9 started
[2022-09-08T15:45:23.001] error: unpack_header: protocol_version 8448 not supported
[2022-09-08T15:33:57.001] unpacking header
[2022-09-08T15:33:57.001] error: destroy_forward: no init
[2022-09-08T15:33:57.001] error: slurm_unpack_received_msg: Message receive failure
[2022-09-08T15:33:57.011] error: CONN:11 Failed to unpack SLURM_PERSIST_INIT message
Regards,
Wadud.
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Timony, Mick <Michael_Timony at hms.harvard.edu>
Sent: 08 September 2022 16:24
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Upgrading SLURM from 18 to 20.11.9
CAUTION: This e-mail originated outside the University of Southampton.
This thread on the forums may help:
https://groups.google.com/g/slurm-users/c/YB55Ru9rvD4<https://urldefense.proofpoint.com/v2/url?u=https-3A__eur03.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgroups.google.com-252Fg-252Fslurm-2Dusers-252Fc-252FYB55Ru9rvD4-26data-3D05-257C01-257Cw.miah-2540soton.ac.uk-257C13f4b2b736764041dc9d08da91af4672-257C4a5378f929f44d3ebe89669d03ada9d8-257C0-257C0-257C637982479244856364-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-257C-257C-257C-26sdata-3DcQGagihxp-252BD2JTZZY-252BMKVH5I-252B386oZIXbCZT9eyfTlg-253D-26reserved-3D0&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=YD5IvPVS42d8-mPHmPwJWday89-bLJVcIhOMV2h0RF19Z6VRQvMRwFxEojs-cw5m&s=zanl-ryBDYiUX9HUz8yAQSXuRJ2IRdJU4QxUNReMkGU&e=>
It looks like you have something on your network with an older version of slurm installed. I'd check the Slurm version installed on your compute nodes and controllers.
The recommended approach to upgrading is to upgrade the SlurmDB first, then the controllers, then the compute nodes. More info here:
https://slurm.schedmd.com/quickstart_admin.html#upgrade<https://urldefense.proofpoint.com/v2/url?u=https-3A__eur03.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fslurm.schedmd.com-252Fquickstart-5Fadmin.html-2523upgrade-26data-3D05-257C01-257Cw.miah-2540soton.ac.uk-257C13f4b2b736764041dc9d08da91af4672-257C4a5378f929f44d3ebe89669d03ada9d8-257C0-257C0-257C637982479244856364-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-257C-257C-257C-26sdata-3DBvJQSt4tfJY616T-252BTzfbGzw4nrTFCuZTbjyuThpssnQ-253D-26reserved-3D0&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=YD5IvPVS42d8-mPHmPwJWday89-bLJVcIhOMV2h0RF19Z6VRQvMRwFxEojs-cw5m&s=qIqQ0GI-S57qI4yYWDb_ZQLiXm0rqBPIHmZgohMTprc&e=>
Regards
--
Mick Timony
Senior DevOps Engineer
Harvard Medical School
--
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Wadud Miah <W.Miah at soton.ac.uk>
Sent: Thursday, September 8, 2022 10:47 AM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [slurm-users] Upgrading SLURM from 18 to 20.11.9
Hi,
I am attempting to upgrade from SLURM 18 to 20.11.9 and when I attempt to start slurmdbd (version 20.11.9), I get the following error messages in /var/log/slurm/slurmdbd.log:
[2022-09-08T15:45:11.115] slurmdbd version 20.11.9 started
[2022-09-08T15:45:23.001] error: unpack_header: protocol_version 8448 not supported
[2022-09-08T15:33:57.001] unpacking header
[2022-09-08T15:33:57.001] error: destroy_forward: no init
[2022-09-08T15:33:57.001] error: slurm_unpack_received_msg: Message receive failure
[2022-09-08T15:33:57.011] error: CONN:11 Failed to unpack SLURM_PERSIST_INIT message
Any help will be greatly appreciated.
Regards,
----------
Wadud Miah
Research Computing Support
University of Southampton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220908/432f13d0/attachment.htm>
More information about the slurm-users
mailing list