[slurm-users] [EXT] Slurmd problem on client

Brian Andrus toomuchit at gmail.com
Mon Aug 24 14:23:50 UTC 2020


IIRC, that is because it is trying to do the 'configless' feature of 
slurm 20 where it uses DNS entries to find the config.

This will happen if /etc/slurm.conf does not exist on the node.

Check that you have that and that it is the same as the one on the master.

Brian Andrus

On 8/24/2020 7:03 AM, Lars Kloo wrote:
>
> Dear Sean,
>
> ’/usr/local/sbin/slurmd -D –vvvv’ gave the following error (same as 
> when running from systemctl):
>
> slurmd: error: _fetch_child: failed to fetch remote configs
>
> I have debug level 5 for both slurmctld and slurmd in slurm.conf, so 
> there may be little more to extract in form of messages.
>
> I am starting to think that the error is in the set-up files of named, 
> alternatively in the network interface scripts. They should work, but 
> slurmd seems to require more.
>
> -/etc/resolv.conf looks correct with both internal and external 
> nameservers and domains on the master, and only the internal on the client
>
> -However, tracking the master named log file while starting slurmd on 
> the client, it looks like slurmd is not offered the internal domain. 
> When those attempts are exhausted, slurmd is directed to the external 
> nameserver/domain (which will not give the information necessary). The 
> difference is that the external domain is explicitly given in the 
> named log file, whereas the internal domain is not.
>
> -Disabling IPv6 in /etc/named.conf removes the error messages in the 
> named log file, but the above slurmd error persists.
>
> Possibly, my approach to solving the DNS/SRV problem may be too primitive.
>
> Best regards,
>
> Lars
>
> *Från:*slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *För 
> *Sean Crosby
> *Skickat:* den 24 augusti 2020 13:44
> *Till:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Ämne:* Re: [slurm-users] [EXT] Slurmd problem on client
>
> Make sure slurmd on the client is stopped, and then run it in verbose 
> mode in the foreground
>
> e.g.
>
> /usr/local/slurm/latest/sbin/slurmd -D -vvvvv
>
> Then post the output
>
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
> Research Computing Services | Business Services
> The University of Melbourne, Victoria 3010 Australia
>
> On Mon, 24 Aug 2020 at 21:11, Lars Kloo <larsa at kth.se 
> <mailto:larsa at kth.se>> wrote:
>
>     *UoM notice: *External email. Be cautious of links, attachments,
>     or impersonation attempts
>
>     ------------------------------------------------------------------------
>
>     Thanks Sean,
>
>     Yes, the regular slurm commands work from the client.
>
>     The firewalld daemon have been stopped/disabled, and iptables are
>     set to let everything through, on both the master and the client.
>     I should have mentioned that in the list of prerequisites in my
>     initial e-mail.
>
>     Best regards,
>
>     Lars
>
>     *Från:*slurm-users [mailto:slurm-users-bounces at lists.schedmd.com
>     <mailto:slurm-users-bounces at lists.schedmd.com>] *För *Sean Crosby
>     *Skickat:* den 24 augusti 2020 12:45
>     *Till:* Slurm User Community List <slurm-users at lists.schedmd.com
>     <mailto:slurm-users at lists.schedmd.com>>
>     *Ämne:* Re: [slurm-users] [EXT] Slurmd problem on client
>
>     Hi Lars,
>
>     Do the regular slurm commands work from the client?
>
>     e.g.
>
>     squeue
>
>     scontrol show part
>
>     If they don't, it would be a sign of communication problems.
>
>     Is there a software firewall running on the master/client?
>
>     Sean
>
>     --
>     Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
>     Research Computing Services | Business Services
>     The University of Melbourne, Victoria 3010 Australia
>
>     On Mon, 24 Aug 2020 at 20:02, Lars Kloo <larsa at kth.se
>     <mailto:larsa at kth.se>> wrote:
>
>         *UoM notice: *External email. Be cautious of links,
>         attachments, or impersonation attempts
>
>         ------------------------------------------------------------------------
>
>         Hello,
>
>         I have a client slurmd problem, that I cannot really figure
>         out how to solve. I would be grateful for any suggestions on
>         how to move forward.
>
>         The master computer on a small local calculational cluster is
>         getting quite old, and therefore I am currently in the process
>         of exchanging it. I also use one calculational node for the
>         basic master-client set-up of all programs, including slurm.
>         Some basic data: CentOS 7.7, slurm 20.02.4.
>
>         Setting up the systemctld on the master node is (seemingly)
>         straightforward. Getting slurmd to work on the client appears
>         more complicated. I get the following error message
>         (journalctl –xe) when starting slurmd on the client:
>
>         Aug 24 11:01:34 cpu3.calc.cluster slurmd[9002]: error:
>         _fetch_child: failed to fetch remote configs
>
>         No useful error messages are obtained from ‘systemctl –l
>         status slurmd.service’ on the client, slurmd.log on the
>         client, nor slurmctld.log on the master.
>
>         In this context, the following should be noted:
>
>         -root and test user exist on the master and client; same uid
>         and gid on both machines
>
>         -ping works in both directions (master <-> client)
>
>         -passphrase-free ssh login work in both directions for both
>         root and for a test user
>
>         -munged is running and with the same key on both machines
>
>         -the same slurm.conf is read from the master and from the client
>
>         -named (bind) has been set up on the master, and nslookup and
>         dig work properly on the client
>
>         -the ‘forward’ zone file of named on the master (DNS) contains
>         the recommended SRV record directing slurmctld requests to
>         port 6817 on the master (syntax seems ok, i.e. no error messages)
>
>         I have also tried to start slurmd in a config-less mode
>         (slurm.conf edited on the master) with the suggested
>         environment variable set (slurmd on the client). Then, slurmd
>         starts without error messages, but slurmctld on the master
>         cannot communicate with slurmd on the client.
>
>         Has anyone encountered a similar problem --- and how did you
>         solve it? Or, do you have any suggestions where to start looking?
>
>         Many thanks for input, and best regards,
>
>         Lars
>
>         //////////////////////////////~~~_/)~~~\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
>         Lars Kloo, Prof.
>
>             Tillämpad fysikalisk kemi        Applied Physical Chemistry
>
>             Institutionen för kemi           Dept. of Chemistry
>
>             Kungliga Tekniska högskolan      Royal Inst. of Technology
>         (KTH)
>
>         100 44  STOCKHOLM                SE-100 44 Stockholm
>
>         SWEDEN
>
>         Tel: 08-790 9343                 Tel: +46-8-790 9343
>
>         Fax: 08-790 9349                 Fax: +46-8-790 9349
>
>         E-post: lakloo at kth.se <mailto:lakloo at kth.se>E-mail:
>         lakloo at kth.se <mailto:lakloo at kth.se>
>
>         WWW: http://www.kth.se/che/divisions/tfk
>
>         \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~~~_/)~~~//////////////////////////////
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200824/4a3ce9d5/attachment-0001.htm>


More information about the slurm-users mailing list