[slurm-users] Maintaining slurm config files for test and production clusters

Fulcomer, Samuel samuel_fulcomer at brown.edu
Wed Jan 4 18:54:24 UTC 2023


Just make the cluster names the same, with different Nodename and Partition
lines. The rest of slurm.conf can be the same. Having two cluster names is
only necessary if you're running production in a multi-cluster
configuration.

Our model has been to have a production cluster and a test cluster which
becomes the production cluster at yearly upgrade time (for us, next week).
The test cluster is also used for rebuilding MPI prior to the upgrade, when
the PMI changes. We force users to resubmit jobs at upgrade time (after the
maintenance reservation) to ensure that MPI runs correctly.



On Wed, Jan 4, 2023 at 12:26 PM Groner, Rob <rug262 at psu.edu> wrote:

> We currently have a test cluster and a production cluster, all on the same
> network.  We try things on the test cluster, and then we gather those
> changes and make a change to the production cluster.  We're doing that
> through two different repos, but we'd like to have a single repo to make
> the transition from testing configs to publishing them more seamless.  The
> problem is, of course, that the test cluster and production clusters have
> different cluster names, as well as different nodes within them.
>
> Using the include directive, I can pull all of the NodeName lines out of
> slurm.conf and put them into %c-nodes.conf files, one for production, one
> for test.  That still leaves me with two problems:
>
>    - The clustername itself will still be a problem.  I WANT the same
>    slurm.conf file between test and production...but the clustername line will
>    be different for them both.  Can I use an env var in that cluster name,
>    because on production there could be a different env var value than on test?
>    - The gres.conf file.  I tried using the same "include" trick that
>    works on slurm.conf, but it failed because it did not know what the
>    "ClusterName" was.  I think that means that either it doesn't work for
>    anything other than slurm.conf, or that the clustername will have to be
>    defined in gres.conf as well?
>
> Any other suggestions of how to keep our slurm files in a single source
> control repo, but still have the flexibility to have them run elegantly on
> either test or production systems?
>
> Thanks.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230104/5a6ff400/attachment-0001.htm>


More information about the slurm-users mailing list