[slurm-users] [EXT] slurmctld error

Sean Crosby scrosby at unimelb.edu.au
Mon Apr 5 21:49:27 UTC 2021


What's the output of

ss -lntp | grep $(pidof slurmdbd)

on your dbd host?

Sean

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Tue, 6 Apr 2021 at 05:00, <ibotsis at isc.tuc.gr> wrote:

> * UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts *
> ------------------------------
>
> Hi Sean,
>
>
>
> 10.0.0.100 is the dbd and ctld host with name se01. Firewall is inactive……
>
>
>
> nc -nz 10.0.0.100 6819 || echo Connection not working
>
>
>
> give me back …..  Connection not working
>
>
>
> jb
>
>
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Sean Crosby
> *Sent:* Monday, April 5, 2021 2:52 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] slurmctld error
>
>
>
> The error shows
>
>
> slurmctld: debug2: Error connecting slurm stream socket at 10.0.0.100:6819:
> Connection refused
>
> slurmctld: error: slurm_persist_conn_open_without_init: failed to open
> persistent connection to se01:6819: Connection refused
>
>
>
> Is 10.0.0.100 the IP address of the host running slurmdbd?
>
> If so, check the iptables firewall running on that host, and make sure the
> ctld server can access port 6819 on the dbd host.
>
> You can check this by running the following from the ctld host (requires
> the package nmap-ncat installed)
>
> nc -nz 10.0.0.100 6819 || echo Connection not working
>
> This will try connecting to port 6819 on the host 10.0.0.100, and output
> nothing if the connection works, and would output Connection not working
> otherwise
>
> I would also test this on the DBD server itself
>
>  --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
>
>
>
>
> On Mon, 5 Apr 2021 at 21:00, Ioannis Botsis <ibotsis at isc.tuc.gr> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments, or
> impersonation attempts
>
>
> ------------------------------
>
> Hi Sean,
>
>
>
> Thank you for your prompt response,  I made the changes you suggested,
> slurmctld refuse running……. find attached new slurmctld -Dvvvv
>
>
>
> jb
>
>
>
>
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Sean Crosby
> *Sent:* Monday, April 5, 2021 11:46 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] slurmctld error
>
>
>
> Hi Jb,
>
>
>
> You have set AccountingStoragePort to 3306 in slurm.conf, which is the
> MySQL port running on the DBD host.
>
>
>
> AccountingStoragePort is the port for the Slurmdbd service, and not for
> MySQL.
>
>
>
> Change AccountingStoragePort to 6819 and it should fix your issues.
>
>
>
> I also think you should comment out the lines
>
>
>
> AccountingStorageUser=slurm
> AccountingStoragePass=/run/munge/munge.socket.2
>
>
>
> You shouldn't need those lines
>
>
>
> Sean
>
>
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
>
>
>
>
> On Mon, 5 Apr 2021 at 18:03, Ioannis Botsis <ibotsis at isc.tuc.gr> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments, or
> impersonation attempts
>
>
> ------------------------------
>
> Hello everyone,
>
>
>
> I installed the slurm 19.05.5 from Ubuntu repo,  for the first time in a
> cluster with 44  identical nodes but I have problem with slurmctld.service
>
>
>
> When I try to activate slurmctd I get the following message…
>
>
>
> fatal: You are running with a database but for some reason we have no TRES
> from it.  This should only happen if the database is down and you don't
> have any state files
>
>
>
>    - Ubuntu 20.04.2 runs on the server and nodes in the exact same
>    version.
>    - munge 0.5.13 installed from Ubuntu repo running on server and nodes.
>    - mysql  Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
>    installed from ubuntu repo running on server.
>
>
>
> slurm.conf is the same on all nodes and on server.
>
>
>
> slurmd.service is active and running on all nodes without problem.
>
>
>
> mysql.service is active and running on server.
>
> slurmdbd.service is active and running on server (slurm_acct_db created).
>
>
>
> Find attached slurm.conf slurmdbd.com  and detailed output of slurmctld
> -Dvvvv  command.
>
>
>
> Any hint?
>
>
>
> Thanks in advance
>
>
>
> jb
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210406/98b49077/attachment-0001.htm>


More information about the slurm-users mailing list