[slurm-users] protocol_version 8960 not supported

Mark Holliman msh at roe.ac.uk
Tue Nov 29 11:58:27 UTC 2022


Hello,

I've just finished building and installing Slurm 22.05.6 from source on a head node and a couple workers. I installed the same RPMs on all the nodes, and the slurmdbd, slurmctld, and slurmd daemons have all come online and appear healthy (test jobs can be submitted to partitions and successfully run on the nodes). But I'm seeing these errors at regular intervals in the slurm logs:

[2022-11-29T11:29:49.683] error: unpack_header: protocol_version 8960 not supported
[2022-11-29T11:29:49.683] error: unpacking header
[2022-11-29T11:29:49.683] error: destroy_forward: no init
[2022-11-29T11:29:49.684] error: slurm_receive_msg_and_forward: [[sdc-uk]:53026] failed: Message receive failure
[2022-11-29T11:29:49.694] error: service_connection: slurm_receive_msg: Message receive failure

My slurm.conf is based on my previous (still existing) cluster config, and I've already encountered one or two issues with plugins not working. I can't find anything online listing the Slurm protocol_version numbers to check what is causing this error, though I'm assuming it's plugin related (slurmdbd maybe?). Turning up the debugging on the slurm logs doesn't help at finding the issue. Does anyone here know what protocol_verson 8960 relates to?

Relevant slurm.conf lines are:

MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=2
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity,task/cgroup
# Job cleanup
Epilog=/etc/slurm/slurm.epilog.clean
UnkillableStepTimeout=120
UnkillableStepProgram=/root/unkillableJobStepScript.sh
# SCHEDULING
#FastSchedule=0
SchedulerType=sched/backfill
SchedulerParameters=nohold_on_prolog_fail
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityType=priority/multifactor
PriorityWeightPartition=1000
PreemptMode=SUSPEND,GANG
PreemptType=preempt/partition_prio
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherFrequency=40
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=5
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=5
SlurmdLogFile=/var/log/slurm/slurmd.log


Cheers,
  Mark

-------------------------------
Mark Holliman
Senior Data Systems Specialist
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------------------------------
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221129/3c321589/attachment.htm>


More information about the slurm-users mailing list