[slurm-users] Nodes show by sinfo in partitions

Verzelloni Fabio fverzell at cscs.ch
Fri May 17 09:17:24 UTC 2019


Hello,
I have a question related to the cloud feature or a feature that can solve an issue that I have with my cluster,to make it simple let say that I have a set of nodes ( let say 10 nodes ), if needed I move node/s from cluster A to cluster B and in my slurm.conf I define all the possible number of available nodes:

Cluster A
NodeName=clusterA-[001-010]

Cluster B
NodeName=clusterB-[001-010]

In normal operation I have 5 nodes in 'cluster A' and 5 in 'cluster B', but in case of needs I reboot a node of 'cluster B' in 'cluster A', and the result will be 4 nodes in 'cluster B' and 6 in 'cluster A'.
The "issue" is that since I specified all possible nodes in slurm.conf, when I ran sinfo what I see is:

Cluster A
Normal up 1-00:00:00 5 up clusterA-[01-05]
Normal up 1-00:00:00 5 down* clusterA-[06-10]
 
Cluster B
Normal up 1-00:00:00 5 up clusterB-[06-10]
Normal up 1-00:00:00 5 down* clusterB-[01-5]

And in both slurmctld.log I have the message:

error: Unable to resolve "clusterA-006": Unknown host

or 

error: Unable to resolve "clusterB-001": Unknown host

Since I have a lot of partitions and a lot of nodes, the sinfo it is much more complicated to read due to the DOWN nodes that are actually not present in the system, is there a way/feature/option that wont display in the sinfo nodes that are actually NOT present and reachable by the slurmctld due to the  "error: Unable to resolve "clusterA-006": Unknown host " ?

Basically I'd like to have in both slurm.conf all the possible nodes but the sinfo should shows:

Cluster A
Normal up 1-00:00:00 5 up clusterA-[01-05]

Cluster B
Normal up 1-00:00:00 5 up clusterB-[06-10]

And If I move a node once the node is actually reachable:

Cluster A
Normal up 1-00:00:00 6 up clusterA-[01-06]

Cluster B
Normal up 1-00:00:00 4 up clusterB-[07-10]

Thanks
Fabio

--
- Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
via Trevano 131 - 6900 Lugano, Switzerland
Tel: +41 (0)91 610 82 04
 



More information about the slurm-users mailing list