Thanks, The reason was found. It was caused by the expiration of the rest
api token.
<nico.derl(a)tutanota.com> 于2024年4月12日周五 22:56写道:
> If you say DBd isn't using 6819, in the sense that you selected a
> different port, make sure the dbdport directive reflects that in both
> slurmdbd.conf and AccountingStoragePort in slurm.conf.
> It must be getting the 6819 from somewhere.
>
>
> 12. Apr. 2024, 16:05 von dspam.liu(a)gmail.com:
>
> slurmctrld and rest are on the same machine, No firewall. secondary
> slurmdbd is background mode, slurmdbd does not listen on port 6819.
>
> OS: ubuntu 20.04
> SLURM: 23.11.0
>
> <nico.derl(a)tutanota.com> 于2024年4月12日周五 20:18写道:
>
> Hey,
> Are slurmctrld and restd on separate machines? Can you manually reach
> them? Could there be a firewall/closed port in the way?
>
>
> 12. Apr. 2024, 11:36 von slurm-users(a)lists.schedmd.com:
>
> hi,slurm configured primary and secondary,The error when requesting
> slurmrest api is as follows, may I ask what is the reason?
>
> # scontrol ping
> Slurmctld(primary) at node003 is UP
> Slurmctld(backup) at node113 is UP
>
>
> # systemctl status slurmrestd.service
> ● slurmrestd.service - Slurm REST daemon
> Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled;
> vendor preset: enabled)
> Active: active (running) since Fri 2024-04-12 17:07:08 CST; 21min ago
> Main PID: 705425 (slurmrestd)
> Tasks: 21 (limit: 629145)
> Memory: 20.3M
> CGroup: /system.slice/slurmrestd.service
> └─705425 /usr/sbin/slurmrestd -f /etc/slurm/slurm.conf
> unix:/var/spool/slurm/slurmrestd.socket 0.0.0.0:6820 -vvv
>
> Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: _slurm_connect: failed
> to connect to 192.168.87.113:6819: Connection refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: Error connecting slurm
> stream socket at 192.168.87.113:6819: Connection refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error:
> slurm_persist_conn_open_without_init: failed to open persistent connection
> to host:node113:6819: Connection refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: Sending
> PersistInit msg: Connection refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error:
> slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection
> refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error:
> init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to
> database -> openapi_get_db_conn() failed to open slurmdb connecti>
> Apr 12 17:08:46 node003 slurmrestd[705425]: error:
> slurm_persist_conn_open_without_init: failed to open persistent connection
> to host:node113:6819: Connection refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: error: Sending PersistInit
> msg: Connection refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: error:
> slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection
> refused
> Apr 12 17:08:46 node003 slurmrestd[705425]: error:
> init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to
> database -> openapi_get_db_conn() failed to open slurmdb connection
>
>
>
>