On 2024/07/10 16:25, jack.mellor--- via slurm-users wrote:
We are running slurm 23.02.6. Our nodes have hyperthreading disabled and we have slurm.conf set to CPU=32 for each node (each node has 2 processes with 16 cores). When we allocated a job, such as salloc -n 32, it will allocate a whole node but using sinfo shows double the allocation in the TRES=64. It also shows in sinfo that the node has 4294967264 idle CPUs.
What does an
scontrol show node
tell you about the node(s)
On our systems, where, sadly, our vendor is unable/unwilling to turn off SMT/hyperthreading, we see (not all fields shown), for a fully allocated, AMD EPYC 7763: so 128 physical core, node
CoresPerSocket=64
CPUAlloc=256 CPUEfctv=256 CPUTot=256
Sockets=2 Boards=1
ThreadsPerCore=2
CfgTRES=cpu=256 AllocTRES=cpu=256
so I guess the question would be, depending on exactly what you see,
have you explictly set, or tried setting,
ThreadsPerCore=1
in the config.