[slurm-users] slurmd traped in 1970

Heitor heitorpbittencourt at gmail.com
Fri Sep 24 18:51:42 UTC 2021


Hello,

I am seeing weird errors on our slurmd.log on 4 different nodes. The
errors are similar and I don't understand them:

[2021-09-24T18:27:41.822] slurmd started on Fri, 24 Sep 2021 18:27:41 +0000
[2021-09-24T18:27:41.822] CPUs=36 Boards=1 Sockets=2 Cores=18 Threads=1 Memory=772485 TmpDisk=93353 Uptime=15975960 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
[2021-09-24T18:29:01.002] error: Munge decode failed: Invalid credential
[2021-09-24T18:29:01.002] ENCODED: Thu Jan 01 00:00:00 1970
[2021-09-24T18:29:01.002] DECODED: Thu Jan 01 00:00:00 1970
[2021-09-24T18:29:01.002] error: slurm_receive_msg_and_forward: REQUEST_NODE_REGISTRATION_STATUS has authentication error: Invalid authentication credential
[2021-09-24T18:29:01.002] error: slurm_receive_msg_and_forward: Protocol authentication error
[2021-09-24T18:29:01.012] error: service_connection: slurm_receive_msg: Protocol authentication error

These errors appear over and over again.

We have chrony installed on all nodes and the clocks are synchronized.

I can `munge -n | unmunge` succesfully, as well as `munge -n` in one
node and unmunge it on another node.

After I resumed one of those nodes and run a dummy job in it, the
errors disappeared.

What do this errors mean? Why Slurm is trying to encode/decode
credentials from 1970?

Thank you,
Heitor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210924/1c642b5a/attachment.sig>


More information about the slurm-users mailing list