[slurm-users] unable to run on all the logical cores

David Bellot david.bellot at lifetrading.com.au
Thu Oct 8 06:19:34 UTC 2020


Hi Rodrigo,

good spot. At least, scontrol show job is now saying that each job only
requires one "CPU", so it seems all the cores are treated the same way now.
Though I still have the problem of not using more than half the cores. So I
suppose it might be due to the way I submit (batchtools in this case) the
jobs.
I'm still investigating even if NumCPUs=1 now as it should be. Thanks.

David

On Thu, Oct 8, 2020 at 4:40 PM Rodrigo Santibáñez <
rsantibanez.uchile at gmail.com> wrote:

> Hi David,
>
> I had the same problem time ago when configuring my first server.
>
> Could you try SelectTypeParameters=CR_CPU instead of
> SelectTypeParameters=CR_Core?
>
> Best regards,
> Rodrigo.
>
> On Thu, Oct 8, 2020, 02:16 David Bellot <david.bellot at lifetrading.com.au>
> wrote:
>
>> Hi,
>>
>> my Slurm cluster has a dozen machines configured as follows:
>>
>> NodeName=foobar01 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20
>> ThreadsPerCore=2 RealMemory=257243 State=UNKNOWN
>>
>> and scheduling is:
>>
>> # SCHEDULING
>> SchedulerType=sched/backfill
>> SelectType=select/cons_tres
>> SelectTypeParameters=CR_Core
>>
>> My problem is that only half of the logical cores are used when I run a
>> computation.
>>
>> Let me explain: I use R and the package 'batchtools' to create jobs. All
>> the jobs are created under the hood with sbatch. If I log in to all the
>> machines in my cluster and do a 'htop', I can see that only half of the
>> logical cores are used. Other methods to measure the load of each machine
>> confirmed this "visual" clue.
>> My jobs ask Slurm for only one cpu per task. I tried to enforce that with
>> the -c 1 but it didn't make any difference.
>>
>> Then I realized there was something strange:
>> when I do scontrol show job <jobid>, I can spot the following output:
>>
>>    NumNodes=1 NumCPUs=2 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>    TRES=cpu=2,node=1,billing=2
>>    Socks/Node=* NtasksPerN:B:S:C=0:0:*:2 CoreSpec=*
>>
>> that is each job uses NumCPUs=2 instead of 1. Also, I'm not sure why
>> TRES=cpu=2
>>
>> Any idea on how to solve this problem and have 100% of the logical cores
>> allocated?
>>
>> Best regards,
>> David
>>
>

-- 
<https://www.lifetrading.com.au/>
David Bellot
Head of Quantitative Research

A. Suite B, Level 3A, 43-45 East Esplanade, Manly, NSW 2095
E. david.bellot at lifetrading.com.au
P. (+61) 0405 263012
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201008/02ed9b6a/attachment.htm>


More information about the slurm-users mailing list