[slurm-users] ERROR: slurmctld: auth/munge: _print_cred: DECODED
Nousheen
nousheenparvaiz at gmail.com
Thu Dec 1 13:15:56 UTC 2022
Hello Everyone,
I am using slurm version 21.08.5 and Centos 7.
I successfully start slurmd on all compute nodes but when I start
slurmctld on server node it gives the following error:
*(base) [nousheen at nousheen ~]$ systemctl status slurmctld.service -l*
● slurmctld.service - Slurm controller daemon
Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor
preset: disabled)
Active: active (running) since Thu 2022-12-01 12:00:42 PKT; 4h 16min ago
Main PID: 1631 (slurmctld)
Tasks: 10
Memory: 4.0M
CGroup: /system.slice/slurmctld.service
├─1631 /usr/sbin/slurmctld -D -s
└─1818 slurmctld: slurmscriptd
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: DECODED: Thu Dec 01 16:17:19 2022
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out
of sync clocks
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Munge decode
failed: Rewound credential
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: ENCODED: Fri Dec 02 16:16:55 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: DECODED: Thu Dec 01 16:17:20 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Check for out
of sync clocks
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Munge decode
failed: Rewound credential
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: ENCODED: Fri Dec 02 16:16:56 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge:
_print_cred: DECODED: Thu Dec 01 16:17:21 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Check for out
of sync clocks
When I run the following command on compute nodes I get the following
output:
[gpu101 at 101 ~]$* munge -n | unmunge*
STATUS: Success (0)
ENCODE_HOST: ??? (0.0.0.101)
ENCODE_TIME: 2022-12-02 16:33:38 +0500 (1669980818)
DECODE_TIME: 2022-12-02 16:33:38 +0500 (1669980818)
TTL: 300
CIPHER: aes128 (4)
MAC: sha1 (3)
ZIP: none (0)
UID: gpu101 (1000)
GID: gpu101 (1000)
LENGTH: 0
Is this error because the encode_host name has question marks and the IP is
also not picked correctly by munge. How can I correct this? All the nodes
keep non-responding when I run a job. However, I have all the clocks synced
across the cluster.
I am new to slurm. Kindly guide me in this matter.
Best Regards,
Nousheen Parvaiz
Ph.D. Scholar
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221201/4e70141e/attachment.htm>
More information about the slurm-users
mailing list