[slurm-users] Is it possible to select the BatchHost for a job through some sort of prolog script?

Daniel Letai dani at letai.org.il
Mon Jul 9 14:09:39 MDT 2018



On 06/07/2018 10:22, Steffen Grunewald wrote:
> On Fri, 2018-07-06 at 07:47:16 +0200, Loris Bennett wrote:
>> Hi Tim,
>>
>> Tim Lin <timtylin at gmail.com> writes:
>>
>>> As the title suggests, I’m searching for a way to have tighter control of which
>>> node the batch script gets executed on. In my case it’s very hard to know which
>>> node is best for this until after all the nodes are allocate, right before the
>>> batch job starts . I’ve looked through all the documentation I can get my hands
>>> on but I haven’t found any mention of any control over the batch host for
>>> admins. Am I missing something?
>> As the documentation of 'sbatch' says:
>>
>>    "When the job allocation is finally granted for the batch script,
>>    Slurm runs a single copy of the batch script on the first node in the
>>    set of allocated nodes. "
>>    
>> I am not aware of any way of changing this.
>>
>> Perhaps you can explain why you feel it is necessary for you do this.
> For me, the above reads like the user has an idea of a metric for how to select
> the node for rank-0 (and perhaps the code is sufficiently asymmetric to justify
> such a selection), but no way to tell Slurm about it.
> What about making the batch script a wrapper around the real payload, on the
> "outer first node" take the list of assigned nodes and possibly reorder it, then
> run the payload (via passphrase-less ssh?) on the selected, "new first" node?
Why not just use salloc instead? Allocate all the nodes for the job, 
then use the script to select (ssh?) the master and start the actual job 
there.

I'm still not sure why that would be necessary, though. Could you give a 
clear example of the master selection process? What metric/constraint is 
involved, and why can it only be obtained after node selection?
> This may require changing some more environment variables, and may harm signalling.
>
> Okay, my suggestion reads like a terrible kludge (which it certainly is), but
> AFAIK there's no way to tell Slurm about "preferred first nodes".
>
> - S
>




More information about the slurm-users mailing list