[slurm-users] [EXTERNAL] Re: trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes

Steve Bland sbland at rossvideo.com
Mon Nov 30 13:11:53 UTC 2020


Thanks Chris

When I did that, they all came back.

Also found that in slurm.conf, ReturnToService was set to 0, so modified that for now. May turn it back to 0 to see if any nodes are lost, but I assume that will be in the log

Interestingly I had this in slurm.conf, thought that would make the initial state up for all

PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP


Steve Bland
Technical Product Manager
Third Party Products
Ross Video | Production Technology Experts
T: +1 (613) 228-0688 ext.4219
www.rossvideo.com<http://www.rossvideo.com/>
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Chris Samuel <chris at csamuel.org>
Sent: 27 November 2020 15:02
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [EXTERNAL] Re: [slurm-users] trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes

On 26/11/20 9:21 am, Steve Bland wrote:

> Sinfo always returns nodes not responding

One thing - do the nodes return to this state when you resume them with
"scontrol update node=srvgridslurm[01-03] state=resume" ?

If they do then what does your slurmctld logs say for the reason for this?

You can bump up the log level on your slurmctld with (for instance
"scontrol setdebug debug" for more info (we run ours at debug all the
time anyway).

All the best,
Chris
--
Chris Samuel  :  https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=04%7C01%7Csbland%40rossvideo.com%7Cd08447ff5072423ef86f08d8930fa82d%7C5d1f9dedbb98418c9ad2e1d24a9152a1%7C1%7C1%7C637421042744008756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=x5GjoV0mij7cMOciZv7w3wBH%2FEGONoV3i0fUDqoeRlI%3D&reserved=0  :  Berkeley, CA, USA
----------------------------------------------

This e-mail and any attachments may contain information that is confidential to Ross Video.

If you are not the intended recipient, please notify me immediately by replying to this message. Please also delete all copies. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201130/8ac11b2c/attachment.htm>


More information about the slurm-users mailing list