[slurm-users] Fwd: Fairshare: users not added

Alex Ninaber alex.ninaber at pandorax.nl
Thu Jan 4 11:56:44 UTC 2024


Hi all,

A problem on slurm-23.02.4-1, 10.6.16-MariaDB; Maria and Slurmctld in
active/active, SlurmDB in active/off, shared IP. Shared spool via Gluster.
DB is an upgraded version of Slurm from somewhere 2017 (upgraded various
times). The question is whether we should give up and start from scratch or
if there's an easy fix.

Problem: whenever we add a new user and add it to sacctmgr, the user shows
up properly in sacct/mgr – but never shows up with the sshare commands
after running some jobs. After restarting slurm a couple of times it shows
up. Problem seems to be there also in the previous version.

Only error we can see in slurmdb log:

[2023-12-21T09:43:30.586] error: slurm_persist_conn_open: Something
happened with the receiving/processing of the persistent connection init
message to 10.141.255.253:6817
: (null)
[2023-12-21T09:43:30.586] error: slurmdb_send_accounting_update_persist:
Unable to open connection to registered cluster cluster.
[2023-12-21T09:43:30.586] error: slurm_receive_msg: No response to
persist_init
[2023-12-21T09:43:30.586] error: update cluster: No error to cluster at
10.141.255.253(6817)
[2023-12-21T09:43:30.586] debug2: DBD_FINI: CLOSE:1 COMMIT:0
[2023-12-21T09:43:30.586] debug4: accounting_storage/as_mysql:
acct_storage_p_commit: got 0 commits


AccountingStorageType=accounting_storage/slurmdbd

# jobaccounting
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux

SlurmctldTimeout=60
SlurmdTimeout=60
TCPTimeout=60
MessageTimeout=60



Best regards,

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240104/4d71754c/attachment.htm>


More information about the slurm-users mailing list