[slurm-users] Can't find an address

Lachlan Musicman datakid at gmail.com
Wed Oct 24 16:58:59 MDT 2018


On Wed, 24 Oct 2018 at 22:56, Zohar Roe MLM <RZohar8 at iai.co.il> wrote:

> Hello,
>
> I have a node that from some reason change state to "Down" evert few
> minutes.
>
> When I change it with scontrol to "resume" its ok until Down again.
>
> In the slurm server log I can see error:
>
> "agent/is_node_resp: node:myName1 RPC:REQUEST_PING : Can't find an
> address, check slurm.conf"
>
>
>
> Now, The error message seems kind of straight forward but I can't find the
> problem.
>
> * The node is up and answer to ping from the slurm server.
>
> * The slurm deamon on the node is up and running.
>
> * There isn't any error on the node itself.
>
> * There are more node, configure the same (except from the ip address)
> that are Ok.
>
> * running "scontrol update state=eesume nodename"myNode" fix the problem
> for a short time
>
> * restarting slurm deamon on node also fix this for a short time
>
>
>
> Any idea what more I can check to resolve this?
>

Here's a quick top of my head checklist:

Check that it's in /etc/hosts
Check the slurmd logs
Make sure there is enough disk space
Make sure that it's datetime is synchronized with the others

cheers
L.

------
'...postwork futures are dismissed with the claim that "it is not in our
nature to be idle", thereby demonstrating at once an essentialist view of
labor and an impoverished imagination of the possibilities of nonwork.'

Kathi Weeks, *The Problem with Work: Feminism, Marxism, Antiwork Politics
and Postwork Imaginaries*
<https://www.dukeupress.edu/The-Problem-with-Work/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181025/989ca0d2/attachment-0001.html>


More information about the slurm-users mailing list