[slurm-users] Maintaining slurm config files for test and production clusters

Fulcomer, Samuel samuel_fulcomer at brown.edu
Wed Jan 4 19:00:32 UTC 2023


...and... using the same cluster name is important in our scenario for the
seamless slurmdbd upgrade transition.

In thinking about it a bit more, I'm not sure I'd want to fold together
production and test/dev configs in the same revision control tree. We keep
them separate. There's no reason to baroquify it.

On Wed, Jan 4, 2023 at 1:54 PM Fulcomer, Samuel <samuel_fulcomer at brown.edu>
wrote:

> Just make the cluster names the same, with different Nodename and
> Partition lines. The rest of slurm.conf can be the same. Having two cluster
> names is only necessary if you're running production in a multi-cluster
> configuration.
>
> Our model has been to have a production cluster and a test cluster which
> becomes the production cluster at yearly upgrade time (for us, next week).
> The test cluster is also used for rebuilding MPI prior to the upgrade, when
> the PMI changes. We force users to resubmit jobs at upgrade time (after the
> maintenance reservation) to ensure that MPI runs correctly.
>
>
>
> On Wed, Jan 4, 2023 at 12:26 PM Groner, Rob <rug262 at psu.edu> wrote:
>
>> We currently have a test cluster and a production cluster, all on the
>> same network.  We try things on the test cluster, and then we gather those
>> changes and make a change to the production cluster.  We're doing that
>> through two different repos, but we'd like to have a single repo to make
>> the transition from testing configs to publishing them more seamless.  The
>> problem is, of course, that the test cluster and production clusters have
>> different cluster names, as well as different nodes within them.
>>
>> Using the include directive, I can pull all of the NodeName lines out of
>> slurm.conf and put them into %c-nodes.conf files, one for production, one
>> for test.  That still leaves me with two problems:
>>
>>    - The clustername itself will still be a problem.  I WANT the same
>>    slurm.conf file between test and production...but the clustername line will
>>    be different for them both.  Can I use an env var in that cluster name,
>>    because on production there could be a different env var value than on test?
>>    - The gres.conf file.  I tried using the same "include" trick that
>>    works on slurm.conf, but it failed because it did not know what the
>>    "ClusterName" was.  I think that means that either it doesn't work for
>>    anything other than slurm.conf, or that the clustername will have to be
>>    defined in gres.conf as well?
>>
>> Any other suggestions of how to keep our slurm files in a single source
>> control repo, but still have the flexibility to have them run elegantly on
>> either test or production systems?
>>
>> Thanks.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230104/15e88de6/attachment.htm>


More information about the slurm-users mailing list