[slurm-users] Question about CPU and core binding

Gestió Servidors sysadmin.caos at uab.cat
Thu Oct 20 08:17:08 UTC 2022


Hi,

I have run two scripts that takes 2 nodes and 8 tasks per node. First script runs with "--distribution=block:block" and second with "--distribution=cyclic:block".

As far as I understand, in the first case, with "--distribution=block:block", job has been executed in this way (and I think it is OK):
JobNode[0] Socket[0] Core[0] is allocated
JobNode[0] Socket[0] Core[1] is allocated
JobNode[0] Socket[0] Core[2] is allocated
JobNode[0] Socket[0] Core[3] is allocated
JobNode[0] Socket[0] Core[4] is allocated
JobNode[0] Socket[0] Core[5] is allocated
JobNode[0] Socket[1] Core[0] is allocated
JobNode[0] Socket[1] Core[1] is allocated
JobNode[1] Socket[0] Core[0] is allocated
JobNode[1] Socket[0] Core[1] is allocated
JobNode[1] Socket[0] Core[2] is allocated
JobNode[1] Socket[0] Core[3] is allocated
JobNode[1] Socket[0] Core[4] is allocated
JobNode[1] Socket[0] Core[5] is allocated
JobNode[1] Socket[1] Core[0] is allocated
JobNode[1] Socket[1] Core[1] is allocated

But, in the second case, with "--distribution=cyclic:block", I assumed that job should have been executed in this way:
JobNode[0] Socket[0] Core[0] is allocated
JobNode[1] Socket[0] Core[0] is allocated
JobNode[0] Socket[0] Core[1] is allocated
JobNode[1] Socket[0] Core[1] is allocated
JobNode[0] Socket[0] Core[2] is allocated
JobNode[1] Socket[0] Core[2] is allocated
JobNode[0] Socket[1] Core[3] is allocated
JobNode[1] Socket[1] Core[3] is allocated
JobNode[0] Socket[0] Core[4] is allocated
JobNode[1] Socket[0] Core[4] is allocated
JobNode[0] Socket[0] Core[5] is allocated
JobNode[1] Socket[0] Core[5] is allocated
JobNode[0] Socket[1] Core[0] is allocated
JobNode[1] Socket[1] Core[0] is allocated
JobNode[0] Socket[1] Core[1] is allocated
JobNode[1] Socket[1] Core[1] is allocated

but job has run in this other way (well, exactly, the same way that in the first execution):
JobNode[0] Socket[0] Core[0] is allocated
JobNode[0] Socket[0] Core[1] is allocated
JobNode[0] Socket[0] Core[2] is allocated
JobNode[0] Socket[0] Core[3] is allocated
JobNode[0] Socket[0] Core[4] is allocated
JobNode[0] Socket[0] Core[5] is allocated
JobNode[0] Socket[1] Core[0] is allocated
JobNode[0] Socket[1] Core[1] is allocated
JobNode[1] Socket[0] Core[0] is allocated
JobNode[1] Socket[0] Core[1] is allocated
JobNode[1] Socket[0] Core[2] is allocated
JobNode[1] Socket[0] Core[3] is allocated
JobNode[1] Socket[0] Core[4] is allocated
JobNode[1] Socket[0] Core[5] is allocated
JobNode[1] Socket[1] Core[0] is allocated
JobNode[1] Socket[1] Core[1] is allocated

What have I done wrong?

My SLURM server has enabled these parameters and flags:
TaskPlugin=task/cgroup,task/none,task/affinity
DebugFlags=CPU_Bind,Backfill,BackfillMap,SelectType,Steps,TraceJobs
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221020/1a68fa33/attachment.htm>


More information about the slurm-users mailing list