[slurm-users] unable to run on all the logical cores
David Bellot
david.bellot at lifetrading.com.au
Thu Oct 8 05:13:35 UTC 2020
Hi,
my Slurm cluster has a dozen machines configured as follows:
NodeName=foobar01 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20
ThreadsPerCore=2 RealMemory=257243 State=UNKNOWN
and scheduling is:
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
My problem is that only half of the logical cores are used when I run a
computation.
Let me explain: I use R and the package 'batchtools' to create jobs. All
the jobs are created under the hood with sbatch. If I log in to all the
machines in my cluster and do a 'htop', I can see that only half of the
logical cores are used. Other methods to measure the load of each machine
confirmed this "visual" clue.
My jobs ask Slurm for only one cpu per task. I tried to enforce that with
the -c 1 but it didn't make any difference.
Then I realized there was something strange:
when I do scontrol show job <jobid>, I can spot the following output:
NumNodes=1 NumCPUs=2 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=2,node=1,billing=2
Socks/Node=* NtasksPerN:B:S:C=0:0:*:2 CoreSpec=*
that is each job uses NumCPUs=2 instead of 1. Also, I'm not sure why
TRES=cpu=2
Any idea on how to solve this problem and have 100% of the logical cores
allocated?
Best regards,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201008/21281907/attachment.htm>
More information about the slurm-users
mailing list