[slurm-users] [EXT] Slurmd problem on client
Brian Andrus
toomuchit at gmail.com
Mon Aug 24 14:23:50 UTC 2020
IIRC, that is because it is trying to do the 'configless' feature of
slurm 20 where it uses DNS entries to find the config.
This will happen if /etc/slurm.conf does not exist on the node.
Check that you have that and that it is the same as the one on the master.
Brian Andrus
On 8/24/2020 7:03 AM, Lars Kloo wrote:
>
> Dear Sean,
>
> ’/usr/local/sbin/slurmd -D –vvvv’ gave the following error (same as
> when running from systemctl):
>
> slurmd: error: _fetch_child: failed to fetch remote configs
>
> I have debug level 5 for both slurmctld and slurmd in slurm.conf, so
> there may be little more to extract in form of messages.
>
> I am starting to think that the error is in the set-up files of named,
> alternatively in the network interface scripts. They should work, but
> slurmd seems to require more.
>
> -/etc/resolv.conf looks correct with both internal and external
> nameservers and domains on the master, and only the internal on the client
>
> -However, tracking the master named log file while starting slurmd on
> the client, it looks like slurmd is not offered the internal domain.
> When those attempts are exhausted, slurmd is directed to the external
> nameserver/domain (which will not give the information necessary). The
> difference is that the external domain is explicitly given in the
> named log file, whereas the internal domain is not.
>
> -Disabling IPv6 in /etc/named.conf removes the error messages in the
> named log file, but the above slurmd error persists.
>
> Possibly, my approach to solving the DNS/SRV problem may be too primitive.
>
> Best regards,
>
> Lars
>
> *Från:*slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *För
> *Sean Crosby
> *Skickat:* den 24 augusti 2020 13:44
> *Till:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Ämne:* Re: [slurm-users] [EXT] Slurmd problem on client
>
> Make sure slurmd on the client is stopped, and then run it in verbose
> mode in the foreground
>
> e.g.
>
> /usr/local/slurm/latest/sbin/slurmd -D -vvvvv
>
> Then post the output
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
> On Mon, 24 Aug 2020 at 21:11, Lars Kloo <larsa at kth.se
> <mailto:larsa at kth.se>> wrote:
>
> *UoM notice: *External email. Be cautious of links, attachments,
> or impersonation attempts
>
> ------------------------------------------------------------------------
>
> Thanks Sean,
>
> Yes, the regular slurm commands work from the client.
>
> The firewalld daemon have been stopped/disabled, and iptables are
> set to let everything through, on both the master and the client.
> I should have mentioned that in the list of prerequisites in my
> initial e-mail.
>
> Best regards,
>
> Lars
>
> *Från:*slurm-users [mailto:slurm-users-bounces at lists.schedmd.com
> <mailto:slurm-users-bounces at lists.schedmd.com>] *För *Sean Crosby
> *Skickat:* den 24 augusti 2020 12:45
> *Till:* Slurm User Community List <slurm-users at lists.schedmd.com
> <mailto:slurm-users at lists.schedmd.com>>
> *Ämne:* Re: [slurm-users] [EXT] Slurmd problem on client
>
> Hi Lars,
>
> Do the regular slurm commands work from the client?
>
> e.g.
>
> squeue
>
> scontrol show part
>
> If they don't, it would be a sign of communication problems.
>
> Is there a software firewall running on the master/client?
>
> Sean
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
> On Mon, 24 Aug 2020 at 20:02, Lars Kloo <larsa at kth.se
> <mailto:larsa at kth.se>> wrote:
>
> *UoM notice: *External email. Be cautious of links,
> attachments, or impersonation attempts
>
> ------------------------------------------------------------------------
>
> Hello,
>
> I have a client slurmd problem, that I cannot really figure
> out how to solve. I would be grateful for any suggestions on
> how to move forward.
>
> The master computer on a small local calculational cluster is
> getting quite old, and therefore I am currently in the process
> of exchanging it. I also use one calculational node for the
> basic master-client set-up of all programs, including slurm.
> Some basic data: CentOS 7.7, slurm 20.02.4.
>
> Setting up the systemctld on the master node is (seemingly)
> straightforward. Getting slurmd to work on the client appears
> more complicated. I get the following error message
> (journalctl –xe) when starting slurmd on the client:
>
> Aug 24 11:01:34 cpu3.calc.cluster slurmd[9002]: error:
> _fetch_child: failed to fetch remote configs
>
> No useful error messages are obtained from ‘systemctl –l
> status slurmd.service’ on the client, slurmd.log on the
> client, nor slurmctld.log on the master.
>
> In this context, the following should be noted:
>
> -root and test user exist on the master and client; same uid
> and gid on both machines
>
> -ping works in both directions (master <-> client)
>
> -passphrase-free ssh login work in both directions for both
> root and for a test user
>
> -munged is running and with the same key on both machines
>
> -the same slurm.conf is read from the master and from the client
>
> -named (bind) has been set up on the master, and nslookup and
> dig work properly on the client
>
> -the ‘forward’ zone file of named on the master (DNS) contains
> the recommended SRV record directing slurmctld requests to
> port 6817 on the master (syntax seems ok, i.e. no error messages)
>
> I have also tried to start slurmd in a config-less mode
> (slurm.conf edited on the master) with the suggested
> environment variable set (slurmd on the client). Then, slurmd
> starts without error messages, but slurmctld on the master
> cannot communicate with slurmd on the client.
>
> Has anyone encountered a similar problem --- and how did you
> solve it? Or, do you have any suggestions where to start looking?
>
> Many thanks for input, and best regards,
>
> Lars
>
> //////////////////////////////~~~_/)~~~\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
> Lars Kloo, Prof.
>
> Tillämpad fysikalisk kemi Applied Physical Chemistry
>
> Institutionen för kemi Dept. of Chemistry
>
> Kungliga Tekniska högskolan Royal Inst. of Technology
> (KTH)
>
> 100 44 STOCKHOLM SE-100 44 Stockholm
>
> SWEDEN
>
> Tel: 08-790 9343 Tel: +46-8-790 9343
>
> Fax: 08-790 9349 Fax: +46-8-790 9349
>
> E-post: lakloo at kth.se <mailto:lakloo at kth.se>E-mail:
> lakloo at kth.se <mailto:lakloo at kth.se>
>
> WWW: http://www.kth.se/che/divisions/tfk
>
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~~~_/)~~~//////////////////////////////
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200824/4a3ce9d5/attachment-0001.htm>
More information about the slurm-users
mailing list