Hello,
We wish to have a scheduling
integration with Slurm. Our own application has a backend system which will
decide the placement of jobs across hosts & CPU cores.
The backend takes
its own time to come back with a placement (which may take a few seconds) & we expect slurm to update it regularly about any change in the current state of
available resources.
For this we believe we have 3 options broadly:
- We use
the const_tres Select plugin & modify it to let it query our backend
system for job placements.
- We write
our own Select plugin avoiding any other Select plugin.
- We use
existing select plugin & also register our own plugin. Idea is that
our plugin will cater to 'our' jobs (specific partition, say) while all other
jobs would be taken up by the default plugin.
Problem with a> is that this leads to modification of
existing plugin code & calling (our) library code from inside Select plugin
lib.
With b> the issue is unless we have the full Slurm cluster to
ourselves this isn't viable. Any insight how to proceed with this? Where should
our select plugin, assuming we need to make one, fits in the slurm integration.
We are not sure whether c> is allowed in Slurm.
We went through existing Select plugins Linear & cons_tres.
However, not able to figure out how to use them or write something on similar
lines to suit our purpose.
Any help in this regard is appreciated.
Apologies if this question (or any other very similar) is already answered,
please point to the relevant thread then.
Thanks in advance for any pointers.
Regards,
Bhaskar.