[slurm-users] How to queue jobs based on non-existent features

Paul Edmon pedmon at cfa.harvard.edu
Fri Jul 10 17:07:11 UTC 2020


Another option would be to use the license feature and just set licenses 
to 0 when they aren't available.

-Paul Edmon-

On 7/10/2020 12:42 PM, Raj Sahae wrote:
>
> Hi Brian and Paul,
>
> You both sent me suggestions about using an offline dummy node with 
> all features set. Thanks for your ideas but this won’t work for me as 
> it’s not practical. We want to allow users to queue for all supported 
> software versions and that easily numbers in the thousands or tens of 
> thousands (every branch, every commit). If I could make this solution 
> work, I would simply set the Available features for all nodes but this 
> feels like it won’t scale well, or is an improper use of the Feature 
> capability.
>
> Thanks,
>
> *Raj Sahae | *m. +1 (408) 230-8531
>
> *From: *Raj Sahae <rsahae at tesla.com>
> *Date: *Thursday, July 9, 2020 at 4:15 PM
> *To: *"slurm-users at schedmd.com" <slurm-users at schedmd.com>
> *Subject: *How to queue jobs based on non-existent features
>
> Hi all,
>
> My apologies if this is sent twice. The first time I sent it without 
> my subscription to the list being complete.
>
> I am attempting to use Slurm as a test automation system for its 
> fairly advanced queueing and job control abilities, and also because 
> it scales very well.
>
> However, since our use case is a bit outside the standard usage of 
> Slurm, we are hitting some issues that don’t appear to have obvious 
> solutions.
>
> In our current setup, the Slurm nodes are hosts attached to a test 
> system. Our pipeline (greatly simplified) would be to install some 
> software on the test system and then run sets of tests against it.
>
> In our old pipeline, this was done in a single job, however with Slurm 
> I was hoping to decouple these two actions as it makes the entire 
> pipeline more robust to update failures and would give us more finely 
> grained job control for the actual test run.
>
> I would like to allow users to queue jobs with constraints indicating 
> which software version they need. Then separately some automated job 
> would scan the queue, see jobs that are not being allocated due to 
> missing resources, and queue software installs appropriately. We 
> attempted to do this using the Active/Available Features 
> configuration. We use HealthCheck and Epilog scripts to scrape the 
> test system for software properties (version, commit, etc.) and assign 
> them as Features. Once an install is complete and the Features are 
> updated, queued jobs would start to be allocated on those nodes.
>
> Herein lies the conundrum. If a user submits a job, constraining to 
> run on Version A, but all nodes in the cluster are currently 
> configured with Features=Version-B, Slurm will fail to queue the job, 
> indicating an invalid feature specification. I completely understand 
> why Features are implemented this way, so my question is, is there 
> some workaround or other Slurm capabilities that I could use to 
> achieve this behavior? Otherwise my options seem to be:
>
>  1. Go back to how we did it before. The pipeline would have the same
>     level of robustness as before but at least we would still be able
>     to leverage other queueing capabilities of Slurm.
>  2. Write our own Feature or Job Submit plugin that customizes this
>     behavior just for us. Seems possible but adds lead time and
>     complexity to the situation.
>
> It's not feasible to update the config for all 
> branches/versions/commits to be AvailableFeatures, as our branch 
> ecosystem is quite large and the maintenance of that approach would 
> not scale well.
>
> Thanks,
>
> *Raj Sahae  |  Manager, Software QA*
>
> 3500 Deer Creek Rd, Palo Alto, CA 94304
>
> m. +1 (408) 230-8531  | rsahae at tesla.com 
> <file:///composeviewinternalloadurl/%3Cmailto:rsahae@tesla.com%3E>
>
> <http://www.tesla.com/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200710/5fc6ab03/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1369 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200710/5fc6ab03/attachment.png>


More information about the slurm-users mailing list