[slurm-users] CPU allocation within a node is not cyclic
崔灏 (CUI Hao)
cuihao.leo at gmail.com
Fri Oct 5 20:15:54 MDT 2018
I checked the source code, and now belive it is a bug of
select/cons_res plugin (or intended behavior).
In src/plugins/select/cons_res/dist_tasks.c (tag slurm-18-08-0-1),
line 1863~1869 triggers block allocation if SelectTypeParameters is
not set to CR_CORE / CR_SOCKET, without regard to task distribution
settings.
I've changed SelectTypeParameters to CR_Core_Memory. But it seems I
have to restart slurmctld to make the config take effect:
$ scontrol reconfigure
slurm_reconfigure error: SelectType change requires restart of the
slurmctld daemon to take effect
I'm afraid that restarting slurmctld will interrupt current tasks, so
I'm still waiting for them to finish.
崔灏 (CUI Hao) <cuihao.leo at gmail.com> 于2018年10月5日周五 下午10:12写道:
>
> According to https://slurm.schedmd.com/cpu_management.html,
> > The default allocation method within a node is cyclic allocation (allocate available CPUs in a round-robin fashion across the sockets within a node).
>
> Not a native English speaker. I think the sentense means that: if a
> job allocates 2*n cores (with -c 2*n) on a node with two physical CPUs
> (sockets), n cores will be allocated on each sockets by default.
>
> However, on our cluster core allocation is not evenly across different
> sockets. I cannot find the problem in our configs. Explicit setting
> with srun --cpu-bind or --distribution cyclic doesn't work, either.
>
--
崔灏 / CUI Hao
Homepage: i-yu.me
Twitter: @cuihaoleo
More information about the slurm-users
mailing list