[slurm-users] [EXT] Slurmd problem on client

Sean Crosby scrosby at unimelb.edu.au
Mon Aug 24 11:43:57 UTC 2020


Make sure slurmd on the client is stopped, and then run it in verbose mode
in the foreground

e.g.

/usr/local/slurm/latest/sbin/slurmd -D -vvvvv

Then post the output
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Mon, 24 Aug 2020 at 21:11, Lars Kloo <larsa at kth.se> wrote:

> * UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts *
> ------------------------------
>
> Thanks Sean,
>
>
>
> Yes, the regular slurm commands work from the client.
>
>
>
> The firewalld daemon have been stopped/disabled, and iptables are set to
> let everything through, on both the master and the client. I should have
> mentioned that in the list of prerequisites in my initial e-mail.
>
>
>
> Best regards,
>
> Lars
>
>
>
>
>
> *Från:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *För *Sean
> Crosby
> *Skickat:* den 24 augusti 2020 12:45
> *Till:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Ämne:* Re: [slurm-users] [EXT] Slurmd problem on client
>
>
>
> Hi Lars,
>
>
>
> Do the regular slurm commands work from the client?
>
>
>
> e.g.
>
>
>
> squeue
>
> scontrol show part
>
>
>
> If they don't, it would be a sign of communication problems.
>
>
>
> Is there a software firewall running on the master/client?
>
>
>
> Sean
>
>
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
>
>
>
>
> On Mon, 24 Aug 2020 at 20:02, Lars Kloo <larsa at kth.se> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments, or
> impersonation attempts
>
>
> ------------------------------
>
> Hello,
>
>
>
> I have a client slurmd problem, that I cannot really figure out how to
> solve. I would be grateful for any suggestions on how to move forward.
>
>
>
> The master computer on a small local calculational cluster is getting
> quite old, and therefore I am currently in the process of exchanging it. I
> also use one calculational node for the basic master-client set-up of all
> programs, including slurm. Some basic data: CentOS 7.7, slurm 20.02.4.
>
>
>
> Setting up the systemctld on the master node is (seemingly)
> straightforward. Getting slurmd to work on the client appears more
> complicated. I get the following error message (journalctl –xe) when
> starting slurmd on the client:
>
> Aug 24 11:01:34 cpu3.calc.cluster slurmd[9002]: error: _fetch_child:
> failed to fetch remote configs
>
>
>
> No useful error messages are obtained from ‘systemctl –l status
> slurmd.service’ on the client, slurmd.log on the client, nor slurmctld.log
> on the master.
>
>
>
> In this context, the following should be noted:
>
> -          root and test user exist on the master and client; same uid
> and gid on both machines
>
> -          ping works in both directions (master <-> client)
>
> -          passphrase-free ssh login work in both directions for both
> root and for a test user
>
> -          munged is running and with the same key on both machines
>
> -          the same slurm.conf is read from the master and from the client
>
> -          named (bind) has been set up on the master, and nslookup and
> dig work properly on the client
>
> -          the ‘forward’ zone file of named on the master (DNS) contains
> the recommended SRV record directing slurmctld requests to port 6817 on the
> master (syntax seems ok, i.e. no error messages)
>
>
>
> I have also tried to start slurmd in a config-less mode (slurm.conf edited
> on the master) with the suggested environment variable set (slurmd on the
> client). Then, slurmd starts without error messages, but slurmctld on the
> master cannot communicate with slurmd on the client.
>
>
>
> Has anyone encountered a similar problem --- and how did you solve it? Or,
> do you have any suggestions where to start looking?
>
>
>
> Many thanks for input, and best regards,
>
> Lars
>
>
>
> //////////////////////////////~~~_/)~~~\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
>                             Lars Kloo, Prof.
>
>
>
>     Tillämpad fysikalisk kemi        Applied Physical Chemistry
>
>     Institutionen för kemi           Dept. of Chemistry
>
>     Kungliga Tekniska högskolan      Royal Inst. of Technology (KTH)
>
>     100 44  STOCKHOLM                SE-100 44 Stockholm
>
>                                      SWEDEN
>
>
>
>     Tel: 08-790 9343                 Tel: +46-8-790 9343
>
>     Fax: 08-790 9349                 Fax: +46-8-790 9349
>
>     E-post: lakloo at kth.se            E-mail: lakloo at kth.se
>
>
>
>              WWW: http://www.kth.se/che/divisions/tfk
>
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~~~_/)~~~//////////////////////////////
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200824/34a7952a/attachment-0001.htm>


More information about the slurm-users mailing list