Hi Alan,
unfortunately, process placement in Slurm is kind of black magic for sub-node jobs, i.e. jobs that allocate only a small number of CPUs of a node.
I have recently raised a similar question here:
https://support.schedmd.com/show_bug.cgi?id=19236
And the buttom line was, that to "really have control over task placement you really have to allocate the node in --exclusive manner".
Best regards Jürgen
* Alan Stange via slurm-users slurm-users@lists.schedmd.com [240607 14:52]:
All,
I have a very simple slurm cluster. It's just a single system with 2 sockets and 16 cores in each socket. I would like to be able to submit a simple task into this cluster, and to have the cpus assigned to that task allocated round robin across the two sockets. Everything I try is putting all the cpus for this single task on the same socket.
I have not specified any CpuBind options in the slurm.conf file. For example, if I try
$ srun -c 4 --pty bash
I get a shell prompt on the system, and can run
$ taskset -cp $$ pid 12345 current affinity list: 0,2,4,6
and I get this same set of cpus no matter what options I try (the cluster is idle with no tasks consuming slots).
I've tried various srun command line options like: --hint=compute_bound --hint=memory_bound various --cpubind options -B 2:2 -m block:cyclic and block:fcyclic
Note that if I try to allocation 17 cpus, then I do get the 17th cpu allocated on the 2nd socket.
What magic incantation is needed to get an allocation where the cpus are chosen round robin across the sockets?
Thank you!
Alan
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com