[slurm-users] Is it possible to select the BatchHost for a job through some sort of prolog script?
Steffen Grunewald
steffen.grunewald at aei.mpg.de
Fri Jul 6 01:22:13 MDT 2018
On Fri, 2018-07-06 at 07:47:16 +0200, Loris Bennett wrote:
> Hi Tim,
>
> Tim Lin <timtylin at gmail.com> writes:
>
> > As the title suggests, I’m searching for a way to have tighter control of which
> > node the batch script gets executed on. In my case it’s very hard to know which
> > node is best for this until after all the nodes are allocate, right before the
> > batch job starts . I’ve looked through all the documentation I can get my hands
> > on but I haven’t found any mention of any control over the batch host for
> > admins. Am I missing something?
>
> As the documentation of 'sbatch' says:
>
> "When the job allocation is finally granted for the batch script,
> Slurm runs a single copy of the batch script on the first node in the
> set of allocated nodes. "
>
> I am not aware of any way of changing this.
>
> Perhaps you can explain why you feel it is necessary for you do this.
For me, the above reads like the user has an idea of a metric for how to select
the node for rank-0 (and perhaps the code is sufficiently asymmetric to justify
such a selection), but no way to tell Slurm about it.
What about making the batch script a wrapper around the real payload, on the
"outer first node" take the list of assigned nodes and possibly reorder it, then
run the payload (via passphrase-less ssh?) on the selected, "new first" node?
This may require changing some more environment variables, and may harm signalling.
Okay, my suggestion reads like a terrible kludge (which it certainly is), but
AFAIK there's no way to tell Slurm about "preferred first nodes".
- S
--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~
More information about the slurm-users
mailing list