[slurm-users] [EXT] slurmctld error

Sean Crosby scrosby at unimelb.edu.au
Mon Apr 5 11:52:06 UTC 2021


The error shows

slurmctld: debug2: Error connecting slurm stream socket at 10.0.0.100:6819:
Connection refused
slurmctld: error: slurm_persist_conn_open_without_init: failed to open
persistent connection to se01:6819: Connection refused

Is 10.0.0.100 the IP address of the host running slurmdbd?

If so, check the iptables firewall running on that host, and make sure the
ctld server can access port 6819 on the dbd host.

You can check this by running the following from the ctld host (requires
the package nmap-ncat installed)

nc -nz 10.0.0.100 6819 || echo Connection not working

This will try connecting to port 6819 on the host 10.0.0.100, and output
nothing if the connection works, and would output Connection not working
otherwise

I would also test this on the DBD server itself


--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Mon, 5 Apr 2021 at 21:00, Ioannis Botsis <ibotsis at isc.tuc.gr> wrote:

> * UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts *
> ------------------------------
>
> Hi Sean,
>
>
>
> Thank you for your prompt response,  I made the changes you suggested,
> slurmctld refuse running……. find attached new slurmctld -Dvvvv
>
>
>
> jb
>
>
>
>
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Sean Crosby
> *Sent:* Monday, April 5, 2021 11:46 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] slurmctld error
>
>
>
> Hi Jb,
>
>
>
> You have set AccountingStoragePort to 3306 in slurm.conf, which is the
> MySQL port running on the DBD host.
>
>
>
> AccountingStoragePort is the port for the Slurmdbd service, and not for
> MySQL.
>
>
>
> Change AccountingStoragePort to 6819 and it should fix your issues.
>
>
>
> I also think you should comment out the lines
>
>
>
> AccountingStorageUser=slurm
> AccountingStoragePass=/run/munge/munge.socket.2
>
>
>
> You shouldn't need those lines
>
>
>
> Sean
>
>
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
>
>
>
>
> On Mon, 5 Apr 2021 at 18:03, Ioannis Botsis <ibotsis at isc.tuc.gr> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments, or
> impersonation attempts
>
>
> ------------------------------
>
> Hello everyone,
>
>
>
> I installed the slurm 19.05.5 from Ubuntu repo,  for the first time in a
> cluster with 44  identical nodes but I have problem with slurmctld.service
>
>
>
> When I try to activate slurmctd I get the following message…
>
>
>
> fatal: You are running with a database but for some reason we have no TRES
> from it.  This should only happen if the database is down and you don't
> have any state files
>
>
>
>    - Ubuntu 20.04.2 runs on the server and nodes in the exact same
>    version.
>    - munge 0.5.13 installed from Ubuntu repo running on server and nodes.
>    - mysql  Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
>    installed from ubuntu repo running on server.
>
>
>
> slurm.conf is the same on all nodes and on server.
>
>
>
> slurmd.service is active and running on all nodes without problem.
>
>
>
> mysql.service is active and running on server.
>
> slurmdbd.service is active and running on server (slurm_acct_db created).
>
>
>
> Find attached slurm.conf slurmdbd.com  and detailed output of slurmctld
> -Dvvvv  command.
>
>
>
> Any hint?
>
>
>
> Thanks in advance
>
>
>
> jb
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210405/c3cbc2b6/attachment-0001.htm>


More information about the slurm-users mailing list