[slurm-users] Nodes show by sinfo in partitions
Verzelloni Fabio
fverzell at cscs.ch
Fri May 17 09:17:24 UTC 2019
Hello,
I have a question related to the cloud feature or a feature that can solve an issue that I have with my cluster,to make it simple let say that I have a set of nodes ( let say 10 nodes ), if needed I move node/s from cluster A to cluster B and in my slurm.conf I define all the possible number of available nodes:
Cluster A
NodeName=clusterA-[001-010]
Cluster B
NodeName=clusterB-[001-010]
In normal operation I have 5 nodes in 'cluster A' and 5 in 'cluster B', but in case of needs I reboot a node of 'cluster B' in 'cluster A', and the result will be 4 nodes in 'cluster B' and 6 in 'cluster A'.
The "issue" is that since I specified all possible nodes in slurm.conf, when I ran sinfo what I see is:
Cluster A
Normal up 1-00:00:00 5 up clusterA-[01-05]
Normal up 1-00:00:00 5 down* clusterA-[06-10]
Cluster B
Normal up 1-00:00:00 5 up clusterB-[06-10]
Normal up 1-00:00:00 5 down* clusterB-[01-5]
And in both slurmctld.log I have the message:
error: Unable to resolve "clusterA-006": Unknown host
or
error: Unable to resolve "clusterB-001": Unknown host
Since I have a lot of partitions and a lot of nodes, the sinfo it is much more complicated to read due to the DOWN nodes that are actually not present in the system, is there a way/feature/option that wont display in the sinfo nodes that are actually NOT present and reachable by the slurmctld due to the "error: Unable to resolve "clusterA-006": Unknown host " ?
Basically I'd like to have in both slurm.conf all the possible nodes but the sinfo should shows:
Cluster A
Normal up 1-00:00:00 5 up clusterA-[01-05]
Cluster B
Normal up 1-00:00:00 5 up clusterB-[06-10]
And If I move a node once the node is actually reachable:
Cluster A
Normal up 1-00:00:00 6 up clusterA-[01-06]
Cluster B
Normal up 1-00:00:00 4 up clusterB-[07-10]
Thanks
Fabio
--
- Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
via Trevano 131 - 6900 Lugano, Switzerland
Tel: +41 (0)91 610 82 04
More information about the slurm-users
mailing list