[slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED

Michael Robbert mrobbert at mines.edu
Thu Dec 1 15:52:47 UTC 2022


I believe that the error you need to pay attention to for this issue is this line:
 
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks
 
 
It looks like your compute nodes clock is a full day ahead of your controller node. Dec. 2 instead of Dec. 1. The clocks need to be in sync for munge to work.
 
Mike Robbert
Cyberinfrastructure Specialist, Cyberinfrastructure and Advanced Research Computing
Information and Technology Solutions (ITS)
303-273-3786 | mrobbert at mines.edu  

Our values: Trust | Integrity | Respect | Responsibility


 
 
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Nousheen <nousheenparvaiz at gmail.com>
Date: Thursday, December 1, 2022 at 06:19
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [External] [slurm-users] ERROR: slurmctld: auth/munge: _print_cred: DECODED

CAUTION: This email originated from outside of the Colorado School of Mines organization. Do not click on links or open attachments unless you recognize the sender and know the content is safe.

 
 

 

Hello Everyone,

 

I am using slurm version 21.08.5 and Centos 7.

 

 I successfully start slurmd on all compute nodes but when I start slurmctld on server node it gives the following error:

 

(base) [nousheen at nousheen ~]$ systemctl status slurmctld.service -l
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-12-01 12:00:42 PKT; 4h 16min ago
 Main PID: 1631 (slurmctld)
    Tasks: 10
   Memory: 4.0M
   CGroup: /system.slice/slurmctld.service
           ├─1631 /usr/sbin/slurmctld -D -s
           └─1818 slurmctld: slurmscriptd  

Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: DECODED: Thu Dec 01 16:17:19 2022
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Munge decode failed: Rewound credential
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: ENCODED: Fri Dec 02 16:16:55 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: DECODED: Thu Dec 01 16:17:20 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Munge decode failed: Rewound credential
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: ENCODED: Fri Dec 02 16:16:56 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: DECODED: Thu Dec 01 16:17:21 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks

 

When I run the following command on compute nodes I get the following output:

 

 [gpu101 at 101 ~]$ munge -n | unmunge

STATUS:           Success (0)
ENCODE_HOST:      ??? (0.0.0.101)
ENCODE_TIME:      2022-12-02 16:33:38 +0500 (1669980818)
DECODE_TIME:      2022-12-02 16:33:38 +0500 (1669980818)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              gpu101 (1000)
GID:              gpu101 (1000)
LENGTH:           0
 

Is this error because the encode_host name has question marks and the IP is also not picked correctly by munge. How can I correct this? All the nodes keep non-responding when I run a job. However, I have all the clocks synced across the cluster. 

 

I am new to slurm. Kindly guide me in this matter.

 

 



Best Regards,

Nousheen Parvaiz
Ph.D. Scholar 
 








ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221201/59c57dcf/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 8292 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221201/59c57dcf/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 8347 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221201/59c57dcf/attachment-0001.bin>


More information about the slurm-users mailing list