[slurm-users] [EXT] slurmctld error

ibotsis at isc.tuc.gr ibotsis at isc.tuc.gr
Mon Apr 5 19:00:01 UTC 2021


Hi Sean,

 

10.0.0.100 is the dbd and ctld host with name se01. Firewall is inactive……

 

nc -nz 10.0.0.100 6819 || echo Connection not working

 

give me back …..  Connection not working

 

jb

 

 

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Sean Crosby
Sent: Monday, April 5, 2021 2:52 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXT] slurmctld error

 

The error shows


slurmctld: debug2: Error connecting slurm stream socket at 10.0.0.100:6819 <http://10.0.0.100:6819> : Connection refused

slurmctld: error: slurm_persist_conn_open_without_init: failed to open persistent connection to se01:6819: Connection refused

 

Is 10.0.0.100 the IP address of the host running slurmdbd?

If so, check the iptables firewall running on that host, and make sure the ctld server can access port 6819 on the dbd host.

You can check this by running the following from the ctld host (requires the package nmap-ncat installed)

nc -nz 10.0.0.100 6819 || echo Connection not working

This will try connecting to port 6819 on the host 10.0.0.100, and output nothing if the connection works, and would output Connection not working otherwise

I would also test this on the DBD server itself

 

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia

 

 

On Mon, 5 Apr 2021 at 21:00, Ioannis Botsis <ibotsis at isc.tuc.gr <mailto:ibotsis at isc.tuc.gr> > wrote:


UoM notice: External email. Be cautious of links, attachments, or impersonation attempts

 

  _____  

Hi Sean,

 

Thank you for your prompt response,  I made the changes you suggested, slurmctld refuse running……. find attached new slurmctld -Dvvvv

 

jb

 

 

 

From: slurm-users <slurm-users-bounces at lists.schedmd.com <mailto:slurm-users-bounces at lists.schedmd.com> > On Behalf Of Sean Crosby
Sent: Monday, April 5, 2021 11:46 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXT] slurmctld error

 

Hi Jb,

 

You have set AccountingStoragePort to 3306 in slurm.conf, which is the MySQL port running on the DBD host.

 

AccountingStoragePort is the port for the Slurmdbd service, and not for MySQL.

 

Change AccountingStoragePort to 6819 and it should fix your issues.

 

I also think you should comment out the lines 

 

AccountingStorageUser=slurm
AccountingStoragePass=/run/munge/munge.socket.2

 

You shouldn't need those lines

 

Sean

 

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia

 

 

On Mon, 5 Apr 2021 at 18:03, Ioannis Botsis <ibotsis at isc.tuc.gr <mailto:ibotsis at isc.tuc.gr> > wrote:


UoM notice: External email. Be cautious of links, attachments, or impersonation attempts

 

  _____  

Hello everyone,

 

I installed the slurm 19.05.5 from Ubuntu repo,  for the first time in a cluster with 44  identical nodes but I have problem with slurmctld.service

 

When I try to activate slurmctd I get the following message…

 

fatal: You are running with a database but for some reason we have no TRES from it.  This should only happen if the database is down and you don't have any state files

 

*	Ubuntu 20.04.2 runs on the server and nodes in the exact same version.
*	munge 0.5.13 installed from Ubuntu repo running on server and nodes.
*	mysql  Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))  installed from ubuntu repo running on server.

 

slurm.conf is the same on all nodes and on server.

 

slurmd.service is active and running on all nodes without problem.

 

mysql.service is active and running on server.

slurmdbd.service is active and running on server (slurm_acct_db created).

 

Find attached slurm.conf slurmdbd.com <http://slurmdbd.com>   and detailed output of slurmctld -Dvvvv  command.

 

Any hint?

 

Thanks in advance

 

jb

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210405/dc586d41/attachment.htm>


More information about the slurm-users mailing list