[slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

Gestió Servidors sysadmin.caos at uab.cat
Tue Nov 30 11:23:58 UTC 2021


In last days, my nodes are showing error "slurm_receive_msg_and_forward: Zero Bytes were transmitted or received". After reviewing all configuration, I have notice that problem is the time difference between nodes and server. If nodes are "bad" configured (time in the future or in the past respect to the server), then, slurmd daemon starts but user can't run "squeue" or "sinfo". After executing "date MMYYhhmm" (with the server hour) and, also, "hwclock --systohc" in each node, slurmd daemons runs perfectly in each node and user can submit jobs or get the queues info.

I know I can use "ntpd" or similar, but I don't know why, when I configure my slurmctld server as a NTP server, it can share its date/time but when nodes tries to syncronize with it, stratum shows value 16, so nodes couldn't syncronize...

My question is: is there any configuracion parameter to allow that SLURM works fine regardless of the time/date of the server?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211130/6248cbc2/attachment.htm>

More information about the slurm-users mailing list