[slurm-users] SlurmcltdHost confusion
Michael Gutteridge
michael.gutteridge at gmail.com
Thu Dec 14 00:53:51 UTC 2023
I'll apologize because I don't have a complete answer. I'm not sure why
that doesn't work, but my understanding of how it should work for failover
scenarios is a "SlurmctldHost" line for each of the controllers, e.g.:
SlurmctldHost=host1
SlurmctldHost=host2
...
The list format seems to be used in some other scenario I don't completely
understand. We're using the multiple lines for our HA arrangement and it
seems to be working OK.
- Michael
On Wed, Dec 13, 2023 at 12:18 PM Jackson, Gary L. <Gary.Jackson at jhuapl.edu>
wrote:
> The SlurmctldHost value is set like the following in my slurm.conf:
>
>
>
> SlurmctldHost=host0,host1
>
>
>
> That seems to be legal according to the documentation. However, I get
> error messages like the following:
>
>
>
> $ srun id
>
> srun: error: get_addr_info: getaddrinfo() failed: Name or service not known
>
> srun: error: slurm_set_addr: Unable to resolve "host0,host1"
>
> srun: error: Unable to establish control machine address
>
> srun: error: Unable to allocate resources: Address already in use
>
>
>
> If I try to put IP addresses in parentheses per the documentation, I get
> different errors:
>
>
>
> $ srun id
>
> srun: error: Bad value "host0(12.34.56.78),host1" for SlurmctldHost
>
> srun: error: No SlurmctldHost defined.
>
> srun: fatal: Unable to process configuration file
>
>
>
> If I put a single hostname, or a hostname with an address in parentheses
> as the value for SlurmctldHost, it works fine but I have no failover.
>
>
>
> I’m running 23.02.6:
>
>
>
> $ sinfo --version
>
> slurm 23.02.6
>
>
>
> What’s going on?
>
>
>
> --
>
> Gary
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231213/47855aa2/attachment.htm>
More information about the slurm-users
mailing list