[slurm-users] Fwd: Fairshare: users not added
Alex Ninaber
alex.ninaber at pandorax.nl
Thu Jan 4 11:56:44 UTC 2024
Hi all,
A problem on slurm-23.02.4-1, 10.6.16-MariaDB; Maria and Slurmctld in
active/active, SlurmDB in active/off, shared IP. Shared spool via Gluster.
DB is an upgraded version of Slurm from somewhere 2017 (upgraded various
times). The question is whether we should give up and start from scratch or
if there's an easy fix.
Problem: whenever we add a new user and add it to sacctmgr, the user shows
up properly in sacct/mgr – but never shows up with the sshare commands
after running some jobs. After restarting slurm a couple of times it shows
up. Problem seems to be there also in the previous version.
Only error we can see in slurmdb log:
[2023-12-21T09:43:30.586] error: slurm_persist_conn_open: Something
happened with the receiving/processing of the persistent connection init
message to 10.141.255.253:6817
: (null)
[2023-12-21T09:43:30.586] error: slurmdb_send_accounting_update_persist:
Unable to open connection to registered cluster cluster.
[2023-12-21T09:43:30.586] error: slurm_receive_msg: No response to
persist_init
[2023-12-21T09:43:30.586] error: update cluster: No error to cluster at
10.141.255.253(6817)
[2023-12-21T09:43:30.586] debug2: DBD_FINI: CLOSE:1 COMMIT:0
[2023-12-21T09:43:30.586] debug4: accounting_storage/as_mysql:
acct_storage_p_commit: got 0 commits
AccountingStorageType=accounting_storage/slurmdbd
# jobaccounting
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldTimeout=60
SlurmdTimeout=60
TCPTimeout=60
MessageTimeout=60
Best regards,
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240104/4d71754c/attachment.htm>
More information about the slurm-users
mailing list