[slurm-users] number of tasks that can run on a node without oversubscribing

Juergen Salk juergen.salk at uni-ulm.de
Fri Jul 12 20:30:49 UTC 2019


Hallo,

the cpu vs. cores vs. threads issues also confused me at the very
beginning. Although, in general, we do not encourage our users to make
use of hyperthreading, we have decided to leave it enabled in the BIOS
as there are some use cases that are known to benefit from
hyperthreading.

I think with 

  NodeName=hpathuri-linux CPUs=8 RealMemory=15833 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN

and 

  SelectTypeParameters=CR_CPU

you'll indeed be able to schedule up to 8 tasks on that node as you
were expecting. However, you'll also have to check carefully how tasks
are then get pinned to the CPUs. For a 2 task job (-n 2) this will
probably by default pin both processes to one and the same physical
core which may not be what you want. 

I found the following discussion very helpful:

  https://bugs.schedmd.com/show_bug.cgi?id=1328

For our test environment which serves as a sandbox for our next
cluster system I've ended up with 

  NodeName=DEFAULT CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN

and 

  SelectTypeParameters=CR_Core_Memory

This is for compute nodes that have 2 sockets, 2 x 8 core processors
with hyperthreading enabled in the BIOS. So I followed the
"CPUs=<core_count -- not thread_count>" approach suggested in the
discussion above. This will tell Slurm to only schedule the physical
cores. With this setting Slurm does also show the total physical core
count instead of the thread count and also treats the --mem-per-cpu
option as "--mem-per-core" which is in our case what most of our users
expect. 

Best regards
Jürgen

-- 
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471



* mercan <ahmet.mercan at uhem.itu.edu.tr> [190712 21:52]:
> Hi;
> 
> If you want to use the threads as cpus, you should set CR_CPU, instead of
> CR_Core.
> 
> Regards;
> 
> Ahmet M.
> 
> 
> 12.07.2019 21:29 tarihinde mercan yazdı:
> > Hi;
> > 
> > You can find the Definitions of Socket, Core, & Thread at:
> > 
> > https://slurm.schedmd.com/mc_support.html
> > 
> > Your status:
> > 
> > CPUs=COREs=Sockets*CoresPerSocket=1*4=4
> > 
> > Threads=COREs*ThreadsPerCore=4*2=8
> > 
> > Regards;
> > 
> > Ahmet M.
> > 
> > 
> > 
> > 12.07.2019 20:15 tarihinde Hanu Pathuri yazdı:
> > > 
> > > Hi,
> > > 
> > > Here is my node information. I am confused with the terminology
> > > w.r.t CPU vs CORE.
> > > 
> > > NodeName=hpathuri-linux CPUs=8 RealMemory=15833 Sockets=1
> > > CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN.
> > > 
> > > I am unable to schedule more than 4 tasks without over subscribing
> > > even through my configuration looks like this:
> > > 
> > > SchedulerType=sched/backfill
> > > 
> > > #SchedulerPort=7321
> > > 
> > > SelectType=select/cons_res
> > > 
> > > SelectTypeParameters=CR_Core
> > > 
> > > PreemptMode=suspend,GANG
> > > 
> > > Could help clarify what is going on?
> > > 
> > > I was expecting to schedule 8 tasks.
> > > 
> > > Thanks
> > > 
> > 
> 

-- 
GPG A997BA7A | 87FC DA31 5F00 C885 0DC3  E28F BD0D 4B33 A997 BA7A



More information about the slurm-users mailing list