Hi Everyone,We have a SLURM cluster of three different types of nodes. One partition consists of nodes that have a large number of CPUs, 256 CPUs on each node.I'm trying to find out the current CPU allocation on some of those nodes but part of the information I gathered seems to be incorrect. If I use "scontrol show node <node-name>", I get this for the CPU info:RealMemory=450000 AllocMem=262144 FreeMem=235397 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
CPUAlloc=256 CPUEfctv=256 CPUTot=256 CPULoad=126.65
CfgTRES=cpu=256,mem=450000M,billing=256AllocTRES=cpu=256,mem=256G
However, when I tried to identify those jobs to which the node's CPUs have been allocated, and get a tally of the allocated CPUs, I can only see 128 CPUs that are effectively allocated on that node, based on the output of squeue --state=R -o "%C %N". So I don't quite understand why the running jobs on the nodes account for just 128, and not 256, CPU allocation even though scontrol reports 100% CPU allocation on the node. Could this be due to some misconfiguration, or a bug in the SLURM version we're running? We're running Version=23.02.4. The interesting thing is that we have six nodes that have similar specs, and all of them show up as allocated in the output of sinfo, but the running jobs on each node account for just 128 CPU allocation, as if they're all capped at 128.Any thoughts, suggestions or assistance to figure this out would be greatly appreciated.Thanks,Muhammad
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com