Hello,
I have compiled SLURM-24.11.3 and I have configured two GPUs in my system (slurmctld and slurmd are running in the same computer). Computes has a old processor Intel i7 with 4 cores and 4 hyperthreading. Node is configured with "NodeName=mysystem CPUs=8 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7940 Gres=gpu:geforce_gtx_titan_x:1,gpu:geforce_gtx_titan_black:1". "lscpu" command returns:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation CPU family: 6 Model: 26 Model name: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz BIOS Model name: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
File gres.conf is: NodeName=mysystem Autodetect=off Name=gpu Type=geforce_gtx_titan_x File=/dev/nvidia0 CPUs=0-1 NodeName=mysystem Autodetect=off Name=gpu Type=geforce_gtx_titan_black File=/dev/nvidia1 CPUs=2-3
However, when I start daemon "slurmctld", system returns this error: [2025-04-28T09:35:41.003] error: _check_core_range_matches_sock: gres/gpu GRES core specification 0-1 for node aopcvis5 doesn't match socket boundaries. (Socket 0 is cores 0-3) [2025-04-28T09:35:41.003] error: Setting node aopcvis5 state to INVAL with reason:gres/gpu GRES core specification 0-1 for node aopcvis5 doesn't match socket boundaries. (Socket 0 is cores 0-3)
Where is my configuration error?
Thanks.
Socket(s): 1 NUMA node(s): 1 [...] NodeName=mysystem Autodetect=off Name=gpu Type=geforce_gtx_titan_x File=/dev/nvidia0 CPUs=0-1 NodeName=mysystem Autodetect=off Name=gpu Type=geforce_gtx_titan_black File=/dev/nvidia1 CPUs=2-3
What do you intend to achieve with CPUs=... if the host is single-socket?
https://slurm.schedmd.com/gres.conf.html#OPT_Cores has your answer though, I think.