[slurm-users] Nodes will not change state from "DOWN*"/"DOWN"
Pär Lundö
par.lundo at foi.se
Fri Jul 5 11:15:15 UTC 2019
Hi,
Managed to isolate the problem, and it was that the slurm uid was not
the same across the network.
Now (simple) job runs without problem.
Regards,
P
On 2019-07-05 11:39, Pär Lundö wrote:
> Hi,
>
> I am running Slurm 19.05 on Ubuntu 18.05 (controller and server) and
> 18.10 (nodes).
>
> My problem is that I cannot get the nodes to change its state to UP or
> IDLE from "DOWN*" ("*" indicating that the communication is lost).
>
> I can ping both the node´s name (its hostname) and the IP address of
> the node. I have added the IP address of the node (with only one node
> running) in the "NodeAddr"-filed in the "slurm.conf"-file as follows:
> "NodeName=lxclient10 NodeAddr=192.168.1.10 "... As stated by the
> configurator-tool.
>
> Running "scontrol show node" the stated "REASON" is "Node unexpectedly
> rebooted".
>
> However running "scontrol update NodeName=lxclient10 State=RESUME" the
> state is changed to IDLE. Happy with that I submit a job, the job is
> queued and submitted but job is noted as "PD" and waiting "Nodes
> required for job are DOWN, DRAINED or reserved for jobs in higher
> priority partitions" and the nod is noted as "IDLE*+COMPLETING" (noted
> via the "scontrol show node"-command).
>
> After a while, and running "squeue" to check what is happening the
> job´s state is "CG" ("Completing").
>
> Simultanously running "scontrol show node" I can see that the CPULoad
> is small, or 0 and no CPUs are allocated ("CPUAlloc=0").
>
> My network is a gigabit network, no firewalls are active. Node can
> ping server and server can ping node (both IP and hostname).
>
> Any thoughts on why this is happening?
>
> Best regards,
>
> P
>
>
--
Hälsningar, Pär
________________________________
Pär Lundö
Forskare
Avdelningen för Ledningssystem
FOI
Totalförsvarets forskningsinstitut
164 90 Stockholm
Besöksadress:
Olau Magnus väg 33, Linköping
Tel: +46 13 37 86 01
Mob: +46 734 447 815
Vxl: +46 13 37 80 00
par.lundo at foi.se
www.foi.se
More information about the slurm-users
mailing list