[slurm-users] Slurmd problem on client

Lars Kloo larsa at kth.se
Mon Aug 24 10:00:04 UTC 2020


Hello,

 

I have a client slurmd problem, that I cannot really figure out how to
solve. I would be grateful for any suggestions on how to move forward.

 

The master computer on a small local calculational cluster is getting quite
old, and therefore I am currently in the process of exchanging it. I also
use one calculational node for the basic master-client set-up of all
programs, including slurm. Some basic data: CentOS 7.7, slurm 20.02.4.

 

Setting up the systemctld on the master node is (seemingly) straightforward.
Getting slurmd to work on the client appears more complicated. I get the
following error message (journalctl –xe) when starting slurmd on the client:

Aug 24 11:01:34 cpu3.calc.cluster slurmd[9002]: error: _fetch_child: failed
to fetch remote configs

 

No useful error messages are obtained from ‘systemctl –l status
slurmd.service’ on the client, slurmd.log on the client, nor slurmctld.log
on the master.

 

In this context, the following should be noted:

-          root and test user exist on the master and client; same uid and
gid on both machines

-          ping works in both directions (master <-> client)

-          passphrase-free ssh login work in both directions for both root
and for a test user

-          munged is running and with the same key on both machines

-          the same slurm.conf is read from the master and from the client

-          named (bind) has been set up on the master, and nslookup and dig
work properly on the client

-          the ‘forward’ zone file of named on the master (DNS) contains the
recommended SRV record directing slurmctld requests to port 6817 on the
master (syntax seems ok, i.e. no error messages)

 

I have also tried to start slurmd in a config-less mode (slurm.conf edited
on the master) with the suggested environment variable set (slurmd on the
client). Then, slurmd starts without error messages, but slurmctld on the
master cannot communicate with slurmd on the client.

 

Has anyone encountered a similar problem --- and how did you solve it? Or,
do you have any suggestions where to start looking?

 

Many thanks for input, and best regards,

Lars

 

//////////////////////////////~~~_/)~~~\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

                            Lars Kloo, Prof.

 

    Tillämpad fysikalisk kemi        Applied Physical Chemistry

    Institutionen för kemi           Dept. of Chemistry

    Kungliga Tekniska högskolan      Royal Inst. of Technology (KTH)

    100 44  STOCKHOLM                SE-100 44 Stockholm

                                     SWEDEN

 

    Tel: 08-790 9343                 Tel: +46-8-790 9343

    Fax: 08-790 9349                 Fax: +46-8-790 9349

    E-post:  <mailto:lakloo at kth.se> lakloo at kth.se            E-mail:
<mailto:lakloo at kth.se> lakloo at kth.se

 

             WWW:  <http://www.kth.se/che/divisions/tfk>
http://www.kth.se/che/divisions/tfk

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~~~_/)~~~//////////////////////////////

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200824/7006a5eb/attachment.htm>


More information about the slurm-users mailing list