[slurm-users] Question about CPU and core binding
Gestió Servidors
sysadmin.caos at uab.cat
Thu Oct 20 08:17:08 UTC 2022
Hi,
I have run two scripts that takes 2 nodes and 8 tasks per node. First script runs with "--distribution=block:block" and second with "--distribution=cyclic:block".
As far as I understand, in the first case, with "--distribution=block:block", job has been executed in this way (and I think it is OK):
JobNode[0] Socket[0] Core[0] is allocated
JobNode[0] Socket[0] Core[1] is allocated
JobNode[0] Socket[0] Core[2] is allocated
JobNode[0] Socket[0] Core[3] is allocated
JobNode[0] Socket[0] Core[4] is allocated
JobNode[0] Socket[0] Core[5] is allocated
JobNode[0] Socket[1] Core[0] is allocated
JobNode[0] Socket[1] Core[1] is allocated
JobNode[1] Socket[0] Core[0] is allocated
JobNode[1] Socket[0] Core[1] is allocated
JobNode[1] Socket[0] Core[2] is allocated
JobNode[1] Socket[0] Core[3] is allocated
JobNode[1] Socket[0] Core[4] is allocated
JobNode[1] Socket[0] Core[5] is allocated
JobNode[1] Socket[1] Core[0] is allocated
JobNode[1] Socket[1] Core[1] is allocated
But, in the second case, with "--distribution=cyclic:block", I assumed that job should have been executed in this way:
JobNode[0] Socket[0] Core[0] is allocated
JobNode[1] Socket[0] Core[0] is allocated
JobNode[0] Socket[0] Core[1] is allocated
JobNode[1] Socket[0] Core[1] is allocated
JobNode[0] Socket[0] Core[2] is allocated
JobNode[1] Socket[0] Core[2] is allocated
JobNode[0] Socket[1] Core[3] is allocated
JobNode[1] Socket[1] Core[3] is allocated
JobNode[0] Socket[0] Core[4] is allocated
JobNode[1] Socket[0] Core[4] is allocated
JobNode[0] Socket[0] Core[5] is allocated
JobNode[1] Socket[0] Core[5] is allocated
JobNode[0] Socket[1] Core[0] is allocated
JobNode[1] Socket[1] Core[0] is allocated
JobNode[0] Socket[1] Core[1] is allocated
JobNode[1] Socket[1] Core[1] is allocated
but job has run in this other way (well, exactly, the same way that in the first execution):
JobNode[0] Socket[0] Core[0] is allocated
JobNode[0] Socket[0] Core[1] is allocated
JobNode[0] Socket[0] Core[2] is allocated
JobNode[0] Socket[0] Core[3] is allocated
JobNode[0] Socket[0] Core[4] is allocated
JobNode[0] Socket[0] Core[5] is allocated
JobNode[0] Socket[1] Core[0] is allocated
JobNode[0] Socket[1] Core[1] is allocated
JobNode[1] Socket[0] Core[0] is allocated
JobNode[1] Socket[0] Core[1] is allocated
JobNode[1] Socket[0] Core[2] is allocated
JobNode[1] Socket[0] Core[3] is allocated
JobNode[1] Socket[0] Core[4] is allocated
JobNode[1] Socket[0] Core[5] is allocated
JobNode[1] Socket[1] Core[0] is allocated
JobNode[1] Socket[1] Core[1] is allocated
What have I done wrong?
My SLURM server has enabled these parameters and flags:
TaskPlugin=task/cgroup,task/none,task/affinity
DebugFlags=CPU_Bind,Backfill,BackfillMap,SelectType,Steps,TraceJobs
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221020/1a68fa33/attachment.htm>
More information about the slurm-users
mailing list