I had missed cloud_reg_addrs - we're running an older version of Slurm and although I'd found https://slurm.schedmd.com/archive/slurm-23.02.7/power_save.html I hadn't gone through all of the options in https://slurm.schedmd.com/archive/slurm-23.02.7/slurm.conf.html

Thank you for your help,

Martin

On Fri, 19 Jul 2024, at 16:21, Brian Andrus via slurm-users wrote:

Martin,

In a nutshell, when slurmd starts, it tells that info to slurmctld. That is the "registration" event mentioned.

Brian Andrus

On 7/19/2024 5:44 AM, Martin Lee via slurm-users wrote:
I've read the following in the slurm power saving docs:
https://slurm.schedmd.com/power_save.html

cloud_dns

By default, Slurm expects that the network addresses for cloud nodes won't be known until creation of the node and that Slurm will be notified of the node's address upon registration. Since Slurm communications rely on the node configuration found in the slurm.conf, Slurm will tell the client command, after waiting for all nodes to boot, each node's IP address. However, in environments where the nodes are in DNS, this step can be avoided by configuring this option.


I am creating the nodes on demand and don't know the IP ahead of the instance start, so cloud_dns is not set.

I'm confused specifically by "Slurm will be notified of the node's address upon registration." Who/what is expected to do this? If it is expected to be performed by the ResumeProgram, does it need to be done before slurmd starts on the node? Is it OK if the node does it after slurmd has started with something like:

scontrol update nodename=$(hostname -s) nodeaddr=$(hostname -I) nodehostname=$(hostname)
scontrol reconfigure

Thank you,

Martin


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com