[slurm-users] Maintaining slurm config files for test and production clusters
Brian Andrus
toomuchit at gmail.com
Wed Jan 4 18:53:10 UTC 2023
One of the simple ways I have dealt with different configs is to symlink
/etc/slurm/slurm.conf to the appropriate file (eg: slurm-dev.conf and
slurm-prod.conf)
In fact, I use the symlink for my dev and nothing (configless) for prod.
Then I can change a running node to/from dev/prod by merely
creating/deleting the symlink and restarting slurmd.
Just an option that may work for you.
I also use separate repos for prod/dev when I am working on
packages/testing. I rather prefer that separation so I don't have
someone accidentally update to a package that is not production-ready.
Brian Andrus
On 1/4/2023 9:22 AM, Groner, Rob wrote:
> We currently have a test cluster and a production cluster, all on the
> same network. We try things on the test cluster, and then we gather
> those changes and make a change to the production cluster. We're
> doing that through two different repos, but we'd like to have a single
> repo to make the transition from testing configs to publishing them
> more seamless. The problem is, of course, that the test cluster and
> production clusters have different cluster names, as well as different
> nodes within them.
>
> Using the include directive, I can pull all of the NodeName lines out
> of slurm.conf and put them into %c-nodes.conf files, one for
> production, one for test. That still leaves me with two problems:
>
> * The clustername itself will still be a problem. I WANT the same
> slurm.conf file between test and production...but the clustername
> line will be different for them both. Can I use an env var in
> that cluster name, because on production there could be a
> different env var value than on test?
> * The gres.conf file. I tried using the same "include" trick that
> works on slurm.conf, but it failed because it did not know what
> the "ClusterName" was. I think that means that either it doesn't
> work for anything other than slurm.conf, or that the clustername
> will have to be defined in gres.conf as well?
>
> Any other suggestions of how to keep our slurm files in a single
> source control repo, but still have the flexibility to have them run
> elegantly on either test or production systems?
>
> Thanks.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230104/54d3ca0e/attachment.htm>
More information about the slurm-users
mailing list