[slurm-users] slurmd -C showing incorrect core count

mike tie mtie at carleton.edu
Sun Mar 8 20:18:51 UTC 2020


I am running a slurm client on a virtual machine.  The virtual machine
originally had a core count of 10.  But I have now increased the cores to
16, but "slurmd -C" continues to show 10.  I have increased the core count
in the slurm.conf file. and that is being seen correctly.  The state of the
node is stuck in a Drain state because of this conflict.  How do I get
slurmd -C to see the new number of cores?

I'm running slurm 18.08.  I have tried running "scontrol reconfigure" on
the head node.  I have restarted slurmd on all the client nodes, and I have
restarted slurmctld on the master node.

Where is the data about compute note CPUs stored?  I can't seem to find a
config or setting file on the compute node.

The compute node that I am working on is "liverpool"

*mtie at liverpool** ~ $* slurmd -C

NodeName=liverpool CPUs=10 Boards=1 SocketsPerBoard=10 CoresPerSocket=1
ThreadsPerCore=1 RealMemory=64263

UpTime=1-21:55:36


*mtie at liverpool** ~ $* lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                16

On-line CPU(s) list:   0-15

Thread(s) per core:    1

Core(s) per socket:    4

Socket(s):             4

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            15

Model:                 6

Model name:            Common KVM processor

Stepping:              1

CPU MHz:               2600.028

BogoMIPS:              5200.05

Hypervisor vendor:     KVM

Virtualization type:   full

L1d cache:             32K

L1i cache:             32K

L2 cache:              4096K

L3 cache:              16384K

NUMA node0 CPU(s):     0-15

Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc
nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm


*mtie at liverpool** ~ $* more /etc/slurm/slurm.conf | grep liverpool

NodeName=*liverpool* NodeAddr=137.22.10.202 CPUs=16 State=UNKNOWN

PartitionName=BioSlurm Nodes=*liverpool*  Default=YES MaxTime=INFINITE
State=UP


*mtie at liverpool** ~ $* sinfo -n liverpool -o %c

CPUS

16

*mtie at liverpool** ~ $* sinfo -n liverpool -o %E

REASON

Low socket*core*thread count, Low CPUs



Any advice?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200308/ca533942/attachment.htm>


More information about the slurm-users mailing list