hi,slurm configured primary and secondary,The error when requesting slurmrest api is as follows, may I ask what is the reason?
# scontrol ping Slurmctld(primary) at node003 is UP Slurmctld(backup) at node113 is UP
# systemctl status slurmrestd.service ● slurmrestd.service - Slurm REST daemon Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2024-04-12 17:07:08 CST; 21min ago Main PID: 705425 (slurmrestd) Tasks: 21 (limit: 629145) Memory: 20.3M CGroup: /system.slice/slurmrestd.service └─705425 /usr/sbin/slurmrestd -f /etc/slurm/slurm.conf unix:/var/spool/slurm/slurmrestd.socket 0.0.0.0:6820 -vvv
Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: _slurm_connect: failed to connect to 192.168.87.113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: Error connecting slurm stream socket at 192.168.87.113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: Sending PersistInit msg: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connecti> Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: Sending PersistInit msg: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connection
Hey, Are slurmctrld and restd on separate machines? Can you manually reach them? Could there be a firewall/closed port in the way?
12. Apr. 2024, 11:36 von slurm-users@lists.schedmd.com:
hi,slurm configured primary and secondary,The error when requesting slurmrest api is as follows, may I ask what is the reason?
# scontrol ping Slurmctld(primary) at node003 is UP Slurmctld(backup) at node113 is UP
# systemctl status slurmrestd.service ● slurmrestd.service - Slurm REST daemon Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2024-04-12 17:07:08 CST; 21min ago Main PID: 705425 (slurmrestd) Tasks: 21 (limit: 629145) Memory: 20.3M CGroup: /system.slice/slurmrestd.service └─705425 /usr/sbin/slurmrestd -f /etc/slurm/slurm.conf unix:/var/spool/slurm/slurmrestd.socket > 0.0.0.0:6820 http://0.0.0.0:6820> -vvv
Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: _slurm_connect: failed to connect to > 192.168.87.113:6819 http://192.168.87.113:6819> : Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: debug2: Error connecting slurm stream socket at > 192.168.87.113:6819 http://192.168.87.113:6819> : Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: Sending PersistInit msg: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: slurmrestd: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connecti> Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:node113:6819: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: Sending PersistInit msg: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: slurm_rest_auth_p_get_db_conn: unable to connect to slurmdbd: Connection refused Apr 12 17:08:46 node003 slurmrestd[705425]: error: init_connection[v0.0.39]:[[2.0.1.191]:50652] rc[7000]=Unable to connect to database -> openapi_get_db_conn() failed to open slurmdb connection