[slurm-users] ERROR: slurmctld: auth/munge: _print_cred: DECODED

Nousheen nousheenparvaiz at gmail.com
Thu Dec 1 13:15:56 UTC 2022


Hello Everyone,

I am using slurm version 21.08.5 and Centos 7.

 I successfully start slurmd on all compute nodes but when I start
slurmctld on server node it gives the following error:

*(base) [nousheen at nousheen ~]$ systemctl status slurmctld.service -l*
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor
preset: disabled)
   Active: active (running) since Thu 2022-12-01 12:00:42 PKT; 4h 16min ago
 Main PID: 1631 (slurmctld)
    Tasks: 10
   Memory: 4.0M
   CGroup: /system.slice/slurmctld.service
           ├─1631 /usr/sbin/slurmctld -D -s
           └─1818 slurmctld: slurmscriptd

Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: DECODED: Thu Dec 01 16:17:19 2022
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out
of sync clocks
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Munge decode
failed: Rewound credential
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: ENCODED: Fri Dec 02 16:16:55 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: DECODED: Thu Dec 01 16:17:20 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Check for out
of sync clocks
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Munge decode
failed: Rewound credential
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: ENCODED: Fri Dec 02 16:16:56 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: DECODED: Thu Dec 01 16:17:21 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Check for out
of sync clocks

When I run the following command on compute nodes I get the following
output:

 [gpu101 at 101 ~]$* munge -n | unmunge*
STATUS:           Success (0)
ENCODE_HOST:      ??? (0.0.0.101)
ENCODE_TIME:      2022-12-02 16:33:38 +0500 (1669980818)
DECODE_TIME:      2022-12-02 16:33:38 +0500 (1669980818)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              gpu101 (1000)
GID:              gpu101 (1000)
LENGTH:           0

Is this error because the encode_host name has question marks and the IP is
also not picked correctly by munge. How can I correct this? All the nodes
keep non-responding when I run a job. However, I have all the clocks synced
across the cluster.

I am new to slurm. Kindly guide me in this matter.



Best Regards,
Nousheen Parvaiz
Ph.D. Scholar

ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221201/4e70141e/attachment.htm>


More information about the slurm-users mailing list