[slurm-users] trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes

Chris Samuel chris at csamuel.org
Fri Nov 27 20:02:53 UTC 2020


On 26/11/20 9:21 am, Steve Bland wrote:

> Sinfo always returns nodes not responding

One thing - do the nodes return to this state when you resume them with 
"scontrol update node=srvgridslurm[01-03] state=resume" ?

If they do then what does your slurmctld logs say for the reason for this?

You can bump up the log level on your slurmctld with (for instance 
"scontrol setdebug debug" for more info (we run ours at debug all the 
time anyway).

All the best,
Chris
-- 
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list