[slurm-users] More cpus blocked than booked

René Neumaier r.neumaier at lrz.uni-muenchen.de
Fri Sep 13 09:06:03 UTC 2019


Dear all,

recently we upgraded slurm from 18.08.3 to 18.08.7 and we noticed the
following the behaviour:

If we start the following test script:


#!/bin/bash

#SBATCH --job-name=test
#SBATCH --output=test.log
#SBATCH --error=test.err
#SBATCH --ntasks=3
#SBATCH --mem=100M
#SBATCH --qos=high

sleep 30


4CPUs instead of 3CPUs will be consumed - but only 3 will be shown via
'squeue'. The slurmctld.log shows also 3CPUs:
...
[2019-09-12T17:12:09.964] _slurm_rpc_submit_batch_job: JobId=40077
InitPrio=1000 usec=258
[2019-09-12T17:12:12.498] sched: Allocate JobId=40077 NodeList=mpcn5
#CPUs=3 Partition=lemmium
[2019-09-12T17:12:42.554] _job_complete: JobId=40077 WEXITSTATUS 0
[2019-09-12T17:12:42.554] _job_complete: JobId=40077 done
...

The same thing with 1CPU: 2 instead of 1 will be consumed.
But a job with 8CPUs consumed the correct amount of CPUs.

This depends also by the partition / cpu type:
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz = affected
Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz = affected
Quad-Core AMD Opteron(tm) Processor 8356 = not affected

The old AMD Opteron CPUs allows only one thread per core. The Intel CPUs
using HyperThreading and allowing '2' threads per core.

Node example:
NodeName=mpcn2 CPUs=32 RealMemory=128992 Sockets=8 CoresPerSocket=4
ThreadsPerCore=1 State=UNKNOWN
NodeName=mpcn3 CPUs=80 RealMemory=773208 Sockets=2 CoresPerSocket=20
ThreadsPerCore=2 State=UNKNOWN

I think this difference is the reason why odd numbers in the CPU
specification are not possible for a job on the Intel partitions.

I have to admit that I am not sure if this is the case since the last
slurm update, if there is a wrong configuration or if a "half core" (1
thread) cannot be consumed.
Normally jobs were started with even numbers >=8 esp. on the larger nodes...

Gentoo GNU/Linux - Kernel 5.2.11
FastSchedule=1
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
TaskPlugin=task/cgroup
ProctrackType=proctrack/cgroup


I would be grateful for any idea.


Best regards,
René

-- 
_____________________________
René Neumaier
Systemadministration

LMU München
Department für Geo und
Umweltwissenschaften,
Paläontologie & Geobiologie

Richard-Wagner-Str. 10
D-80333 München
Tel.: +4989-2180-6625
Fax.: +4989-2180-6601
rene.neumaier at lmu.de
r.neumaier at lrz.uni-muenchen.de

GPG-Fingerprint:
EC0E B6F6 B3FF 6324 B0C8 9452 EF6B 4E3C 2E59 F5AA



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190913/5bada10f/attachment.sig>


More information about the slurm-users mailing list