[slurm-users] srun and --cpus-per-task

Fri Mar 25 15:49:11 UTC 2022

Hello all,

Thanks for the useful observations. Here is some further env vars:

# non problematic case
$ srun -c 3 --partition=gpu-2080ti env

SRUN_DEBUG=3
SLURM_JOB_CPUS_PER_NODE=4
SLURM_NTASKS=1
SLURM_NPROCS=1
SLURM_CPUS_PER_TASK=3
SLURM_STEP_ID=0
SLURM_STEPID=0
SLURM_NNODES=1
SLURM_JOB_NUM_NODES=1
SLURM_STEP_NUM_NODES=1
SLURM_STEP_NUM_TASKS=1
SLURM_STEP_TASKS_PER_NODE=1
SLURM_CPUS_ON_NODE=4
SLURM_NODEID=0

*SLURM_PROCID=0SLURM_LOCALID=0SLURM_GTIDS=0*

# problematic case - prints two sets of env vars
$ srun -c 1 --partition=gpu-2080ti env

SRUN_DEBUG=3
SLURM_JOB_CPUS_PER_NODE=2
SLURM_NTASKS=2
SLURM_NPROCS=2
SLURM_CPUS_PER_TASK=1
SLURM_STEP_ID=0
SLURM_STEPID=0
SLURM_NNODES=1
SLURM_JOB_NUM_NODES=1
SLURM_STEP_NUM_NODES=1
SLURM_STEP_NUM_TASKS=2
SLURM_STEP_TASKS_PER_NODE=2
SLURM_CPUS_ON_NODE=2
SLURM_NODEID=0

*SLURM_PROCID=0SLURM_LOCALID=0*

*SLURM_GTIDS=0,1*

SRUN_DEBUG=3
SLURM_JOB_CPUS_PER_NODE=2
SLURM_NTASKS=2
SLURM_NPROCS=2
SLURM_CPUS_PER_TASK=1
SLURM_STEP_ID=0
SLURM_STEPID=0
SLURM_NNODES=1
SLURM_JOB_NUM_NODES=1
SLURM_STEP_NUM_NODES=1
SLURM_STEP_NUM_TASKS=2
SLURM_STEP_TASKS_PER_NODE=2
SLURM_CPUS_ON_NODE=2
SLURM_NODEID=0

*SLURM_PROCID=1SLURM_LOCALID=1SLURM_GTIDS=0,1*
Please see the ones in bold. @Hermann Schwärzler how do you plan to manage
this bug? We have currently set SLURM_NTASKS_PER_NODE=1 clusterwide.

Best,
Durai

On Fri, Mar 25, 2022 at 12:45 PM Juergen Salk <juergen.salk at uni-ulm.de>
wrote:

> Hi Bjørn-Helge,
>
> that's very similar to what we did as well in order to avoid confusion with
> Core vs. Threads vs. CPU counts when Hyperthreading is kept enabled in the
> BIOS.
>
> Adding CPUs=<core_count> (not <thread_count>) will tell Slurm to only
> schedule physical cores.
>
> We have
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
>
> and
>
> NodeName=DEFAULT CPUs=48 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2
>
> This is for compute nodes that have 2 sockets, 2 x 24 physical cores
> with hyperthreading enabled in the BIOS. (Although, in general, we do
> not encourage our users to make use of hyperthreading, we have decided
> to leave it enabled in the BIOS as there are some corner cases that
> are known to benefit from hyperthreading.)
>
> With this setting Slurm does also show the total physical core
> counts instead of the thread counts and also treats the --mem-per-cpu
> option as "--mem-per-core" which is in our case what most of our users
> expect.
>
> As to the number of tasks spawned with `--cpus-per-task=1´, I think this
> is intended behavior. The following sentence from the srun manpage is
> probably relevant:
>
> -c, --cpus-per-task=<ncpus>
>
>   If -c is specified without -n, as many tasks will be allocated per
>   node as possible while satisfying the -c restriction.
>
> In our configuration, we allow multiple jobs to run for the same user
> on a node (ExclusiveUser=yes) and we get
>
> $ srun -c 1 echo foo | wc -l
> 1
> $
>
> However, in case of CPUs=<thread_count> instead of CPUs=<core_count>,
> I guess, this would have been 2 lines of output, because the smallest
> unit to schedule for a job is 1 physical core which allows 2 tasks to
> run with hyperthreading enabled.
>
> In case of exclusive node allocation for jobs (i.e. no node
> sharing allowed) Slurm would give all cores of a node to the job
> which allows even more tasks to be spawned:
>
> $ srun --exclusive -c 1 echo foo | wc -l
> 48
> $
>
> 48 lines correspond exactly to the number of physical cores on the
> node. Again, with CPUs=<thread_count> instead of CPUs=<core_count>, I
> would expect 2 x 48 = 96 lines of output, but I did not test that.
>
> Best regards
> Jürgen
>
>
> * Bjørn-Helge Mevik <b.h.mevik at usit.uio.no> [220325 08:49]:
> > For what it's worth, we have a similar setup, with one crucial
> > difference: we are handing out physical cores to jobs, not hyperthreads,
> > and we are *not* seeing this behaviour:
> >
> > $ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nn9999k -q devel echo
> foo
> > srun: job 5371678 queued and waiting for resources
> > srun: job 5371678 has been allocated resources
> > foo
> > $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nn9999k -q devel echo
> foo
> > srun: job 5371680 queued and waiting for resources
> > srun: job 5371680 has been allocated resources
> > foo
> >
> > We have
> >
> > SelectType=select/cons_tres
> > SelectTypeParameters=CR_CPU_Memory
> >
> > and node definitions like
> >
> > NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2
> RealMemory=182784 Gres=localscratch:330G Weight=1000
> >
> > (so we set CPUs to the number of *physical cores*, not *hyperthreads*).
> >
> > --
> > Regards,
> > Bjørn-Helge Mevik, dr. scient,
> > Department for Research Computing, University of Oslo
> >
>
>
>
> --
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50-22471
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220325/c4cc975c/attachment-0003.htm>