Hi,

 

my GPU testing system (named “gpu-node”) is a simple computer with one socket and a processor " Intel(R) Core(TM) i7 CPU 950  @ 3.07GHz". Executing "lscpu", I can see there are 4 cores per socket, 2 threads per core and 8 CPUs:

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                8

On-line CPU(s) list:   0-7

Thread(s) per core:    2

Core(s) per socket:    4

Socket(s):             1

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 26

Model name:            Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz

 

 

My “gres.conf” file is:

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-1

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 CPUs=2-3

 

Running “numactl -H” in “gpu-node” host, reports:

available: 1 nodes (0)

node 0 cpus: 0 1 2 3 4 5 6 7

node 0 size: 7809 MB

node 0 free: 6597 MB

node distances:

node   0

  0:  10

 

CPUs are assigned 0-1 for first GPU and 2-3 for second GPU. However, “lscpu” shows 8 CPUs… If I rewrite “gres.conf” in this way:

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-3

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 CPUs=4-7

 

when I run “scontrol reconfigure”, slurmctld log reports this error message:

[2024-06-05T11:42:18.558] error: _node_config_validate: gres/gpu: invalid GRES core specification (4-7) on node gpu-node

 

So I think SLURM only can get physical cores and not threads, so my node only can serve 4 cores (in “lspcu”) but in gres.conf I need to write “CPUs”, not “Cores”… isn’t it?

 

But if “numactl -H” shows 8 CPUs, why I can use this 8 CPUs in “gres.conf”?

 

Sorry about this large email.

 

Thanks.