<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hello,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve just finished building and installing Slurm 22.05.6 from source on a head node and a couple workers. I installed the same RPMs on all the nodes, and the slurmdbd, slurmctld, and slurmd daemons have all come online and appear healthy
(test jobs can be submitted to partitions and successfully run on the nodes). But I’m seeing these errors at regular intervals in the slurm logs:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">[2022-11-29T11:29:49.683] error: unpack_header: protocol_version 8960 not supported<o:p></o:p></p>
<p class="MsoNormal">[2022-11-29T11:29:49.683] error: unpacking header<o:p></o:p></p>
<p class="MsoNormal">[2022-11-29T11:29:49.683] error: destroy_forward: no init<o:p></o:p></p>
<p class="MsoNormal">[2022-11-29T11:29:49.684] error: slurm_receive_msg_and_forward: [[sdc-uk]:53026] failed: Message receive failure<o:p></o:p></p>
<p class="MsoNormal">[2022-11-29T11:29:49.694] error: service_connection: slurm_receive_msg: Message receive failure<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">My slurm.conf is based on my previous (still existing) cluster config, and I’ve already encountered one or two issues with plugins not working. I can’t find anything online listing the Slurm protocol_version numbers to check what is causing
this error, though I’m assuming it’s plugin related (slurmdbd maybe?). Turning up the debugging on the slurm logs doesn’t help at finding the issue. Does anyone here know what protocol_verson 8960 relates to?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Relevant slurm.conf lines are:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">MpiDefault=none<o:p></o:p></p>
<p class="MsoNormal">ProctrackType=proctrack/pgid<o:p></o:p></p>
<p class="MsoNormal">ReturnToService=2<o:p></o:p></p>
<p class="MsoNormal">SlurmUser=slurm<o:p></o:p></p>
<p class="MsoNormal">StateSaveLocation=/var/spool/slurm/slurmctld<o:p></o:p></p>
<p class="MsoNormal">SwitchType=switch/none<o:p></o:p></p>
<p class="MsoNormal">TaskPlugin=task/affinity,task/cgroup<o:p></o:p></p>
<p class="MsoNormal"># Job cleanup<o:p></o:p></p>
<p class="MsoNormal">Epilog=/etc/slurm/slurm.epilog.clean<o:p></o:p></p>
<p class="MsoNormal">UnkillableStepTimeout=120<o:p></o:p></p>
<p class="MsoNormal">UnkillableStepProgram=/root/unkillableJobStepScript.sh<o:p></o:p></p>
<p class="MsoNormal"># SCHEDULING<o:p></o:p></p>
<p class="MsoNormal">#FastSchedule=0<o:p></o:p></p>
<p class="MsoNormal">SchedulerType=sched/backfill<o:p></o:p></p>
<p class="MsoNormal">SchedulerParameters=nohold_on_prolog_fail<o:p></o:p></p>
<p class="MsoNormal">SelectType=select/cons_res<o:p></o:p></p>
<p class="MsoNormal">SelectTypeParameters=CR_Core_Memory<o:p></o:p></p>
<p class="MsoNormal">PriorityType=priority/multifactor<o:p></o:p></p>
<p class="MsoNormal">PriorityWeightPartition=1000<o:p></o:p></p>
<p class="MsoNormal">PreemptMode=SUSPEND,GANG<o:p></o:p></p>
<p class="MsoNormal">PreemptType=preempt/partition_prio<o:p></o:p></p>
<p class="MsoNormal"># LOGGING AND ACCOUNTING<o:p></o:p></p>
<p class="MsoNormal">AccountingStorageType=accounting_storage/slurmdbd<o:p></o:p></p>
<p class="MsoNormal">JobCompType=jobcomp/none<o:p></o:p></p>
<p class="MsoNormal">JobAcctGatherFrequency=40<o:p></o:p></p>
<p class="MsoNormal">JobAcctGatherType=jobacct_gather/linux<o:p></o:p></p>
<p class="MsoNormal">SlurmctldDebug=5<o:p></o:p></p>
<p class="MsoNormal">SlurmctldLogFile=/var/log/slurm/slurmctld.log<o:p></o:p></p>
<p class="MsoNormal">SlurmdDebug=5<o:p></o:p></p>
<p class="MsoNormal">SlurmdLogFile=/var/log/slurm/slurmd.log<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> Mark<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">-------------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Mark Holliman<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Senior Data Systems Specialist<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Wide Field Astronomy Unit<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Institute for Astronomy<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">University of Edinburgh<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">--------------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>