[slurm-users] dual slurmctld and slurmdbd
toomuchit at gmail.com
Thu Jul 4 00:54:52 UTC 2019
Your welcome :)
If you aren't pleased with the timeouts, you may want to look at the
SlurmctldTimeout in slurm.conf:
The interval, in seconds, that the backup controller waits for the
primary controller to respond before assuming control. The default value
is 120 seconds. May not exceed 65533.
On 7/3/2019 2:45 PM, Tina Fora wrote:
> Thanks Brian Andrus and Chris Samuel.
> I was able to get it to work on our dev setup as primary/backup. Already
> had the shared state directory. If I take primary down it takes about two
> minutes for slurm commands to work again as the backup takes over. When I
> bring the primary back up it is a bit faster.
>> On 2/7/19 1:48 pm, Tina Fora wrote:
>>> We run mysql on a dedicated machine with slurmctld and slurmdbd running
>>> another machine. Now I want to add another machine running slurmctld and
>>> slurmdbd and this machine with be on CentOS 7. Existing one is CentOS 6.
>>> Is this possible? Can I run two seperate slurmctld and slurmdbd point to
>>> the same slurm config and database?
>> Are you trying to set up an HA system (where one controller runs both
>> and a second waits in the wings in case the first fails and will take
>> Or do you want them to run separate clusters?
>> If you want the second, and are happy to have the same users and QOS's
>> on both, then you can run one slurmctld per system and point them at the
>> same slurmdbd (having created a cluster for each there first).
>> If you want HA then it's a lot more complicated as you'll need a (fast)
>> shared filesystem between them both (we use GPFS for this) as both
>> slurmctld's need to see the same state directory all the time.
>> We also run slurmdbd in failover mode talking to the same MySQL/MariaDB
>> instance (but with a backup in case that fails).
>> All the best,
>> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users