[slurm-users] slurmd -C showing incorrect core count
mike tie
mtie at carleton.edu
Sun Mar 8 20:18:51 UTC 2020
I am running a slurm client on a virtual machine. The virtual machine
originally had a core count of 10. But I have now increased the cores to
16, but "slurmd -C" continues to show 10. I have increased the core count
in the slurm.conf file. and that is being seen correctly. The state of the
node is stuck in a Drain state because of this conflict. How do I get
slurmd -C to see the new number of cores?
I'm running slurm 18.08. I have tried running "scontrol reconfigure" on
the head node. I have restarted slurmd on all the client nodes, and I have
restarted slurmctld on the master node.
Where is the data about compute note CPUs stored? I can't seem to find a
config or setting file on the compute node.
The compute node that I am working on is "liverpool"
*mtie at liverpool** ~ $* slurmd -C
NodeName=liverpool CPUs=10 Boards=1 SocketsPerBoard=10 CoresPerSocket=1
ThreadsPerCore=1 RealMemory=64263
UpTime=1-21:55:36
*mtie at liverpool** ~ $* lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 15
Model: 6
Model name: Common KVM processor
Stepping: 1
CPU MHz: 2600.028
BogoMIPS: 5200.05
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc
nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm
*mtie at liverpool** ~ $* more /etc/slurm/slurm.conf | grep liverpool
NodeName=*liverpool* NodeAddr=137.22.10.202 CPUs=16 State=UNKNOWN
PartitionName=BioSlurm Nodes=*liverpool* Default=YES MaxTime=INFINITE
State=UP
*mtie at liverpool** ~ $* sinfo -n liverpool -o %c
CPUS
16
*mtie at liverpool** ~ $* sinfo -n liverpool -o %E
REASON
Low socket*core*thread count, Low CPUs
Any advice?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200308/ca533942/attachment.htm>
More information about the slurm-users
mailing list