[slurm-users] Nodes show by sinfo in partitions

Martijn Kruiten martijn.kruiten at surfsara.nl
Fri May 17 12:04:05 UTC 2019


Hi Fabio,

My guess is that you can (partly) solve this by using the correct state
in slurm.conf. Either CLOUD or FUTURE might be what you're looking for.
See `man slum.conf`.

Kind regards,

Martijn Kruiten

On Fri, 2019-05-17 at 09:17 +0000, Verzelloni  Fabio wrote:
> Hello,
> I have a question related to the cloud feature or a feature that can
> solve an issue that I have with my cluster,to make it simple let say
> that I have a set of nodes ( let say 10 nodes ), if needed I move
> node/s from cluster A to cluster B and in my slurm.conf I define all
> the possible number of available nodes:
> 
> Cluster A
> NodeName=clusterA-[001-010]
> 
> Cluster B
> NodeName=clusterB-[001-010]
> 
> In normal operation I have 5 nodes in 'cluster A' and 5 in 'cluster
> B', but in case of needs I reboot a node of 'cluster B' in 'cluster
> A', and the result will be 4 nodes in 'cluster B' and 6 in 'cluster
> A'.
> The "issue" is that since I specified all possible nodes in
> slurm.conf, when I ran sinfo what I see is:
> 
> Cluster A
> Normal up 1-00:00:00 5 up clusterA-[01-05]
> Normal up 1-00:00:00 5 down* clusterA-[06-10]
>  
> Cluster B
> Normal up 1-00:00:00 5 up clusterB-[06-10]
> Normal up 1-00:00:00 5 down* clusterB-[01-5]
> 
> And in both slurmctld.log I have the message:
> 
> error: Unable to resolve "clusterA-006": Unknown host
> 
> or 
> 
> error: Unable to resolve "clusterB-001": Unknown host
> 
> Since I have a lot of partitions and a lot of nodes, the sinfo it is
> much more complicated to read due to the DOWN nodes that are actually
> not present in the system, is there a way/feature/option that wont
> display in the sinfo nodes that are actually NOT present and
> reachable by the slurmctld due to the  "error: Unable to resolve
> "clusterA-006": Unknown host " ?
> 
> Basically I'd like to have in both slurm.conf all the possible nodes
> but the sinfo should shows:
> 
> Cluster A
> Normal up 1-00:00:00 5 up clusterA-[01-05]
> 
> Cluster B
> Normal up 1-00:00:00 5 up clusterB-[06-10]
> 
> And If I move a node once the node is actually reachable:
> 
> Cluster A
> Normal up 1-00:00:00 6 up clusterA-[01-06]
> 
> Cluster B
> Normal up 1-00:00:00 4 up clusterB-[07-10]
> 
> Thanks
> Fabio
> 
> --
> - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
> via Trevano 131 - 6900 Lugano, Switzerland
> Tel: +41 (0)91 610 82 04
>  
> 
-- 
| System Programmer | SURFsara | Science Park 140 | 1098 XG Amsterdam |
| T +31 6 20043417  | martijn.kruiten at surfsara.nl | www.surfsara.nl |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4807 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190517/f1033ed1/attachment.bin>


More information about the slurm-users mailing list