[slurm-users] trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes
Chris Samuel
chris at csamuel.org
Fri Nov 27 20:02:53 UTC 2020
On 26/11/20 9:21 am, Steve Bland wrote:
> Sinfo always returns nodes not responding
One thing - do the nodes return to this state when you resume them with
"scontrol update node=srvgridslurm[01-03] state=resume" ?
If they do then what does your slurmctld logs say for the reason for this?
You can bump up the log level on your slurmctld with (for instance
"scontrol setdebug debug" for more info (we run ours at debug all the
time anyway).
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list