[slurm-users] [EXT] slurmctld error

Sean Crosby scrosby at unimelb.edu.au
Tue Apr 6 04:31:12 UTC 2021


Interesting. It looks like slurmdbd is not opening the 6819 port

What does

ss -lntp | grep 6819

show? Is something else using that port?

You can also stop the slurmdbd service and run it in debug mode using

slurmdbd -D -vvv

Sean

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Tue, 6 Apr 2021 at 14:02, <ibotsis at isc.tuc.gr> wrote:

> * UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts *
> ------------------------------
>
> Hi Sean
>
>
>
> ss -lntp | grep $(pidof slurmdbd)     return nothing……
>
>
>
> systemctl status slurmdbd.service
>
>
>
> ● slurmdbd.service - Slurm DBD accounting daemon
>
>      Loaded: loaded (/lib/systemd/system/slurmdbd.service; enabled; vendor
> preset: enabled)
>
>      Active: active (running) since Mon 2021-04-05 13:52:35 EEST; 16h ago
>
>        Docs: man:slurmdbd(8)
>
>     Process: 1453365 ExecStart=/usr/sbin/slurmdbd $SLURMDBD_OPTIONS
> (code=exited, status=0/SUCCESS)
>
>    Main PID: 1453375 (slurmdbd)
>
>       Tasks: 1
>
>      Memory: 5.0M
>
>      CGroup: /system.slice/slurmdbd.service
>
>              └─1453375 /usr/sbin/slurmdbd
>
>
>
> Apr 05 13:52:35 se01.grid.tuc.gr systemd[1]: Starting Slurm DBD
> accounting daemon...
>
> Apr 05 13:52:35 se01.grid.tuc.gr systemd[1]: slurmdbd.service: Can't open
> PID file /run/slurmdbd.pid (yet?) after start: Operation not permitted
>
> Apr 05 13:52:35 se01.grid.tuc.gr systemd[1]: Started Slurm DBD accounting
> daemon.
>
>
>
> File /run/slurmdbd.pid exist and has  pidof slurmdbd   value….
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Sean Crosby
> *Sent:* Tuesday, April 6, 2021 12:49 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] slurmctld error
>
>
>
> What's the output of
>
>
>
> ss -lntp | grep $(pidof slurmdbd)
>
>
>
> on your dbd host?
>
>
>
> Sean
>
>
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
>
>
>
>
> On Tue, 6 Apr 2021 at 05:00, <ibotsis at isc.tuc.gr> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments, or
> impersonation attempts
>
>
> ------------------------------
>
> Hi Sean,
>
>
>
> 10.0.0.100 is the dbd and ctld host with name se01. Firewall is inactive……
>
>
>
> nc -nz 10.0.0.100 6819 || echo Connection not working
>
>
>
> give me back …..  Connection not working
>
>
>
> jb
>
>
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Sean Crosby
> *Sent:* Monday, April 5, 2021 2:52 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] slurmctld error
>
>
>
> The error shows
>
>
> slurmctld: debug2: Error connecting slurm stream socket at 10.0.0.100:6819:
> Connection refused
>
> slurmctld: error: slurm_persist_conn_open_without_init: failed to open
> persistent connection to se01:6819: Connection refused
>
>
>
> Is 10.0.0.100 the IP address of the host running slurmdbd?
>
> If so, check the iptables firewall running on that host, and make sure the
> ctld server can access port 6819 on the dbd host.
>
> You can check this by running the following from the ctld host (requires
> the package nmap-ncat installed)
>
> nc -nz 10.0.0.100 6819 || echo Connection not working
>
> This will try connecting to port 6819 on the host 10.0.0.100, and output
> nothing if the connection works, and would output Connection not working
> otherwise
>
> I would also test this on the DBD server itself
>
>  --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
>
>
>
>
> On Mon, 5 Apr 2021 at 21:00, Ioannis Botsis <ibotsis at isc.tuc.gr> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments, or
> impersonation attempts
>
>
> ------------------------------
>
> Hi Sean,
>
>
>
> Thank you for your prompt response,  I made the changes you suggested,
> slurmctld refuse running……. find attached new slurmctld -Dvvvv
>
>
>
> jb
>
>
>
>
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Sean Crosby
> *Sent:* Monday, April 5, 2021 11:46 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] [EXT] slurmctld error
>
>
>
> Hi Jb,
>
>
>
> You have set AccountingStoragePort to 3306 in slurm.conf, which is the
> MySQL port running on the DBD host.
>
>
>
> AccountingStoragePort is the port for the Slurmdbd service, and not for
> MySQL.
>
>
>
> Change AccountingStoragePort to 6819 and it should fix your issues.
>
>
>
> I also think you should comment out the lines
>
>
>
> AccountingStorageUser=slurm
> AccountingStoragePass=/run/munge/munge.socket.2
>
>
>
> You shouldn't need those lines
>
>
>
> Sean
>
>
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
>
>
>
>
> On Mon, 5 Apr 2021 at 18:03, Ioannis Botsis <ibotsis at isc.tuc.gr> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments, or
> impersonation attempts
>
>
> ------------------------------
>
> Hello everyone,
>
>
>
> I installed the slurm 19.05.5 from Ubuntu repo,  for the first time in a
> cluster with 44  identical nodes but I have problem with slurmctld.service
>
>
>
> When I try to activate slurmctd I get the following message…
>
>
>
> fatal: You are running with a database but for some reason we have no TRES
> from it.  This should only happen if the database is down and you don't
> have any state files
>
>
>
>    - Ubuntu 20.04.2 runs on the server and nodes in the exact same
>    version.
>    - munge 0.5.13 installed from Ubuntu repo running on server and nodes.
>    - mysql  Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
>    installed from ubuntu repo running on server.
>
>
>
> slurm.conf is the same on all nodes and on server.
>
>
>
> slurmd.service is active and running on all nodes without problem.
>
>
>
> mysql.service is active and running on server.
>
> slurmdbd.service is active and running on server (slurm_acct_db created).
>
>
>
> Find attached slurm.conf slurmdbd.com  and detailed output of slurmctld
> -Dvvvv  command.
>
>
>
> Any hint?
>
>
>
> Thanks in advance
>
>
>
> jb
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210406/7a7b0570/attachment-0001.htm>


More information about the slurm-users mailing list