[slurm-users] Node can't run simple job when STATUS is up and STATE is idle

Dean Schulze dean.w.schulze at gmail.com
Tue Jan 21 16:05:21 UTC 2020


Thank you, thank you, thank you.  It was the firewall on CentOS 7.  Once I
disabled that it worked.

For anyone else who runs into this issue here is how to disable the
firewall on CentOS 7:

https://linuxize.com/post/how-to-stop-and-disable-firewalld-on-centos-7/



On Tue, Jan 21, 2020 at 7:24 AM Brian Johanson <bjohanso at psc.edu> wrote:

>
> On 1/21/2020 12:32 AM, Chris Samuel wrote:
> > On 20/1/20 3:00 pm, Dean Schulze wrote:
> >
> >> There's either a problem with the source code I cloned from github,
> >> or there is a problem when the controller runs on Ubuntu 19 and the
> >> node runs on CentOS 7.7. I'm downgrading to a stable 19.05 build to
> >> see if that solves the problem.
> >
> > I've run the master branch on a Cray XC without issues, and I concur
> > with what the others have said and suggest it's worth checking the
> > slurmd and slurmctld logs to find out why communications is not right
> > between them.
> >
> and if the logs do not have enough information, run the daemon in the
> foreground with increased verbosity
>
> slurmd -D -v -v -v
>
> As another said, check if the connections are available with telnet
> server->client 'telnet node1 6818' (6818 is the default slurmd port) and
> same from compute->server.
>
> Are these new host builds?  Is there a firewall enabled?  Kinda sounds
> like a firewall on the client that allows outbound (initial connection
> to the slurmctl) but not new inbound (slurmctl ping) connections.
>
> -b
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200121/8aedeaf7/attachment.htm>


More information about the slurm-users mailing list