[slurm-users] How to queue jobs based on non-existent features
Paul Edmon
pedmon at cfa.harvard.edu
Fri Jul 10 13:23:52 UTC 2020
You could set up an dummy node that has the features that are not active
but not allow jobs to schedule to that node by setting it to DOWN. That
would be a hacky way of accomplishing this.
-Paul Edmon-
On 7/9/2020 7:15 PM, Raj Sahae wrote:
>
> Hi all,
>
> My apologies if this is sent twice. The first time I sent it without
> my subscription to the list being complete.
>
> I am attempting to use Slurm as a test automation system for its
> fairly advanced queueing and job control abilities, and also because
> it scales very well.
>
> However, since our use case is a bit outside the standard usage of
> Slurm, we are hitting some issues that don’t appear to have obvious
> solutions.
>
> In our current setup, the Slurm nodes are hosts attached to a test
> system. Our pipeline (greatly simplified) would be to install some
> software on the test system and then run sets of tests against it.
>
> In our old pipeline, this was done in a single job, however with Slurm
> I was hoping to decouple these two actions as it makes the entire
> pipeline more robust to update failures and would give us more finely
> grained job control for the actual test run.
>
> I would like to allow users to queue jobs with constraints indicating
> which software version they need. Then separately some automated job
> would scan the queue, see jobs that are not being allocated due to
> missing resources, and queue software installs appropriately. We
> attempted to do this using the Active/Available Features
> configuration. We use HealthCheck and Epilog scripts to scrape the
> test system for software properties (version, commit, etc.) and assign
> them as Features. Once an install is complete and the Features are
> updated, queued jobs would start to be allocated on those nodes.
>
> Herein lies the conundrum. If a user submits a job, constraining to
> run on Version A, but all nodes in the cluster are currently
> configured with Features=Version-B, Slurm will fail to queue the job,
> indicating an invalid feature specification. I completely understand
> why Features are implemented this way, so my question is, is there
> some workaround or other Slurm capabilities that I could use to
> achieve this behavior? Otherwise my options seem to be:
>
> 1. Go back to how we did it before. The pipeline would have the same
> level of robustness as before but at least we would still be able
> to leverage other queueing capabilities of Slurm.
> 2. Write our own Feature or Job Submit plugin that customizes this
> behavior just for us. Seems possible but adds lead time and
> complexity to the situation.
>
> It's not feasible to update the config for all
> branches/versions/commits to be AvailableFeatures, as our branch
> ecosystem is quite large and the maintenance of that approach would
> not scale well.
>
> Thanks,
>
> *Raj Sahae | Manager, Software QA*
>
> 3500 Deer Creek Rd, Palo Alto, CA 94304
>
> m. +1 (408) 230-8531 | rsahae at tesla.com
> <file:///composeviewinternalloadurl/%3Cmailto:rsahae@tesla.com%3E>
>
> <http://www.tesla.com/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200710/6577f5e8/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1368 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200710/6577f5e8/attachment.png>
More information about the slurm-users
mailing list