[slurm-users] How to queue jobs based on non-existent features
Paul Edmon
pedmon at cfa.harvard.edu
Fri Jul 10 17:07:11 UTC 2020
Another option would be to use the license feature and just set licenses
to 0 when they aren't available.
-Paul Edmon-
On 7/10/2020 12:42 PM, Raj Sahae wrote:
>
> Hi Brian and Paul,
>
> You both sent me suggestions about using an offline dummy node with
> all features set. Thanks for your ideas but this won’t work for me as
> it’s not practical. We want to allow users to queue for all supported
> software versions and that easily numbers in the thousands or tens of
> thousands (every branch, every commit). If I could make this solution
> work, I would simply set the Available features for all nodes but this
> feels like it won’t scale well, or is an improper use of the Feature
> capability.
>
> Thanks,
>
> *Raj Sahae | *m. +1 (408) 230-8531
>
> *From: *Raj Sahae <rsahae at tesla.com>
> *Date: *Thursday, July 9, 2020 at 4:15 PM
> *To: *"slurm-users at schedmd.com" <slurm-users at schedmd.com>
> *Subject: *How to queue jobs based on non-existent features
>
> Hi all,
>
> My apologies if this is sent twice. The first time I sent it without
> my subscription to the list being complete.
>
> I am attempting to use Slurm as a test automation system for its
> fairly advanced queueing and job control abilities, and also because
> it scales very well.
>
> However, since our use case is a bit outside the standard usage of
> Slurm, we are hitting some issues that don’t appear to have obvious
> solutions.
>
> In our current setup, the Slurm nodes are hosts attached to a test
> system. Our pipeline (greatly simplified) would be to install some
> software on the test system and then run sets of tests against it.
>
> In our old pipeline, this was done in a single job, however with Slurm
> I was hoping to decouple these two actions as it makes the entire
> pipeline more robust to update failures and would give us more finely
> grained job control for the actual test run.
>
> I would like to allow users to queue jobs with constraints indicating
> which software version they need. Then separately some automated job
> would scan the queue, see jobs that are not being allocated due to
> missing resources, and queue software installs appropriately. We
> attempted to do this using the Active/Available Features
> configuration. We use HealthCheck and Epilog scripts to scrape the
> test system for software properties (version, commit, etc.) and assign
> them as Features. Once an install is complete and the Features are
> updated, queued jobs would start to be allocated on those nodes.
>
> Herein lies the conundrum. If a user submits a job, constraining to
> run on Version A, but all nodes in the cluster are currently
> configured with Features=Version-B, Slurm will fail to queue the job,
> indicating an invalid feature specification. I completely understand
> why Features are implemented this way, so my question is, is there
> some workaround or other Slurm capabilities that I could use to
> achieve this behavior? Otherwise my options seem to be:
>
> 1. Go back to how we did it before. The pipeline would have the same
> level of robustness as before but at least we would still be able
> to leverage other queueing capabilities of Slurm.
> 2. Write our own Feature or Job Submit plugin that customizes this
> behavior just for us. Seems possible but adds lead time and
> complexity to the situation.
>
> It's not feasible to update the config for all
> branches/versions/commits to be AvailableFeatures, as our branch
> ecosystem is quite large and the maintenance of that approach would
> not scale well.
>
> Thanks,
>
> *Raj Sahae | Manager, Software QA*
>
> 3500 Deer Creek Rd, Palo Alto, CA 94304
>
> m. +1 (408) 230-8531 | rsahae at tesla.com
> <file:///composeviewinternalloadurl/%3Cmailto:rsahae@tesla.com%3E>
>
> <http://www.tesla.com/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200710/5fc6ab03/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1369 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200710/5fc6ab03/attachment.png>
More information about the slurm-users
mailing list