[slurm-users] unable to run on all the logical cores

Rodrigo Santibáñez rsantibanez.uchile at gmail.com
Thu Oct 8 05:38:13 UTC 2020


Hi David,

I had the same problem time ago when configuring my first server.

Could you try SelectTypeParameters=CR_CPU instead of
SelectTypeParameters=CR_Core?

Best regards,
Rodrigo.

On Thu, Oct 8, 2020, 02:16 David Bellot <david.bellot at lifetrading.com.au>
wrote:

> Hi,
>
> my Slurm cluster has a dozen machines configured as follows:
>
> NodeName=foobar01 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20
> ThreadsPerCore=2 RealMemory=257243 State=UNKNOWN
>
> and scheduling is:
>
> # SCHEDULING
> SchedulerType=sched/backfill
> SelectType=select/cons_tres
> SelectTypeParameters=CR_Core
>
> My problem is that only half of the logical cores are used when I run a
> computation.
>
> Let me explain: I use R and the package 'batchtools' to create jobs. All
> the jobs are created under the hood with sbatch. If I log in to all the
> machines in my cluster and do a 'htop', I can see that only half of the
> logical cores are used. Other methods to measure the load of each machine
> confirmed this "visual" clue.
> My jobs ask Slurm for only one cpu per task. I tried to enforce that with
> the -c 1 but it didn't make any difference.
>
> Then I realized there was something strange:
> when I do scontrol show job <jobid>, I can spot the following output:
>
>    NumNodes=1 NumCPUs=2 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>    TRES=cpu=2,node=1,billing=2
>    Socks/Node=* NtasksPerN:B:S:C=0:0:*:2 CoreSpec=*
>
> that is each job uses NumCPUs=2 instead of 1. Also, I'm not sure why
> TRES=cpu=2
>
> Any idea on how to solve this problem and have 100% of the logical cores
> allocated?
>
> Best regards,
> David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201008/7b4b5b5e/attachment-0001.htm>


More information about the slurm-users mailing list