[slurm-users] network/communication failure
Hturner at eng.ua.edu
Mon May 21 09:46:58 MDT 2018
Got it! It was the firewall...
Thanks to all for all the suggestions.
Chemical and Biological Engineering
University of Alabama
3448 SEC, Box 870203
Tuscaloosa, AL 35487
(205) 348-1733 (phone)
(205) 561-7450 (cell)
(205) 348-7558 (fax)
hturner at eng.ua.edu
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Andy Riebs
Sent: Monday, May 21, 2018 10:22 AM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] network/communication failure
Do you have a firewall running?
On 05/21/2018 11:05 AM, Turner, Heath wrote:
> If anyone has advice, I would really appreciate...
> I am running (just installed) slurm-11.17.6, with a master + 2 hosts. It works locally on the master (controller + execution). However, I cannot establish communication from master [triumph01] with the 2 hosts [triumph02,triumph03]. Here is some more info:
> 1. munge is running, and munge verification tests all pass.
> 2. system clocks are in sync on master/hosts.
> 3. identical slurm.conf files are on master/hosts.
> 4. configuration of resources (memory/cpus/etc) are correct and have been confirmed on all machines (all hardware is identical).
> 5. I have attached:
> a) slurm.conf
> b) log file from master slurmctld
> c) log file from host slurmd
> Any ideas about what to try next?
> Heath Turner
> Graduate Coordinator
> Chemical and Biological Engineering
> University of Alabama
> 3448 SEC, Box 870203
> Tuscaloosa, AL 35487
> (205) 348-1733 (phone)
> (205) 561-7450 (cell)
> (205) 348-7558 (fax)
> hturner at eng.ua.edu
andy.riebs at hpe.com
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
More information about the slurm-users