Hi Daniel, Thanks for picking up this query. Let me try to briefly describe my problem.
As you rightly guessed, we have some hardware on the backend which would be used for our jobs to run. The app which manages the h/w has its own set of resource placement/remapping rules to place a job. So, for eg., if only 3 hosts h1, h2, h3 (2 cores available each) are available at some point for a 4 core job then it's only a few combination of cores from these hosts can be allowed for the job. Also there is a preference order of the placements decided by our app.
It's in this respect we want our backend app to bring the placement for the job. Slurm would then dispatch the job accordingly while honoring the exact resource distribution as asked for. In case for the need of preemption as well our backend would decide the placement which would decide which preemptable job candidates to preempt.
So, how should we proceed then? We mayn't have the whole site/cluster to ourselves. There me be other jobs which we don't care about & hence they should go in the usual route from the select plugin which is there (linear, cons_tres etc).
Is there a scope for a separate partition which will encompass our resources only & trigger our plugin only for our jobs? How do the options a>, b> , c> stand (as described in my 1st message) now that I mention our requirement?
A 4th option which comes to my mind is that if there's a possibility through some API interface from Slurm which will inform a separate process P (say) about resource availability on a real time basis. P will talk to our backend app, bring a placement & then ask lSurm to place our job.
Your concern about everchanging resources (being allocated before our backend comes up) is uncalled for as the hosts are segregated as far as our system is concerned. Our hosts will run only our jobs & other Slurm jobs would run in different hosts.
Hope I make myself little more clearer ! Any help would be appreciated.
(Note: We already have a working solution with LSF! LSF does provide option for custom scheduler plugins to let one connect in the decision making loop during scheduling. This only led us to believe Slurm would also have some possibilities.)
Regards, Bhaskar.