Hi all,
We're trying to enable sharding on our compute cluster. On this cluster: - ensicompute-1 comes with 1 NVIDIA V100 GPU ; - ensicompute-13 comes with 3 NVIDIA A40 GPUs ; - all other nodes (for now, ensicompute-11 and ensicompute-12, but several others will come) come with 3 NVIDIA RTX 6000 GPUs.
To enable sharding, I followed these steps: 1. [slurm.conf] Add "shard" to GresTypes ; 2. [slurm.conf] Add "shard:N" to Gres for each node. For testing purposes, I have set N to 9, so each GPU can execute up to 3 jobs concurrently: NodeName=ensicompute-[11-12] Gres=gpu:Quadro:3,shard:9 CPUs=40 RealMemory=128520 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN Feature=gpu,ht 3. [gres.conf] Declare the shards after the definition of the GPUs GRES.
For step 3, I tried different things, leading to different outcomes: a. Define a global number of shards, for the entire host: Name=shard Count=9 ==> This way, sharding seems to work ok, but all the jobs are executed on GPU#0. If running 12 jobs for example, 9 of them are assigned to GPU#0 and start executing, while 3 of them remain in a pending state. No job is assigned to GPU#1 or GPU#2.
b. Define a per-GPU number of shards, associated to the device file representing the GPU: Name=shard Count=3 File=/dev/nvidia0 Name=shard Count=3 File=/dev/nvidia1 Name=shard Count=3 File=/dev/nvidia2 ==> In this case, the slurmd service fails to start on the compute node. The error message found in /var/log/slurmd.log is "fatal: Invalid GRES record for shard, count does not match File value".
c. Don't define anything about shards in gres.conf. ==> Same behavior than in a.: all jobs are executed on GPU#0.
I attach to this message the full content of the slurm.conf and gres.conf files. What is the proper way to configure sharding in a cluster with several GPUs per node? Is there a way to specify how many shards should be allocated to each GPU?
Cheers, François
=== slurm.conf ===
# slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ClusterName=ensimag SlurmctldHost=nash ProctrackType=proctrack/cgroup SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld TaskPlugin=task/affinity,task/cgroup ReturnToService=2 # # # TIMERS InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 # # # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres # # # LOGGING AND ACCOUNTING JobCompType=jobcomp/none JobAcctGatherFrequency=30 SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log # # # COMPUTE NODES GresTypes=gpu,shard NodeName=ensicompute-1 Gres=gpu:Tesla:1,shard:3 CPUs=40 RealMemory=128520 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN Feature=gpu,ht NodeName=ensicompute-13 Gres=gpu:A40:3,shard:9 CPUs=40 RealMemory=128520 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN Feature=gpu,ht NodeName=ensicompute-[11-12] Gres=gpu:Quadro:3,shard:9 CPUs=40 RealMemory=128520 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN Feature=gpu,ht PartitionName=compute Nodes=ALL Default=YES MaxTime=INFINITE State=UP
=== gres.conf ===
AutoDetect=off
# ensicompute-1 NodeName=ensicompute-1 Name=gpu Type=Tesla File=/dev/nvidia0 NodeName=ensicompute-1 Name=shard Count=3 File=/dev/nvidia0
# ensicompute-11 NodeName=ensicompute-11 Name=gpu Type=Quadro File=/dev/nvidia0 NodeName=ensicompute-11 Name=gpu Type=Quadro File=/dev/nvidia1 NodeName=ensicompute-11 Name=gpu Type=Quadro File=/dev/nvidia2 NodeName=ensicompute-11 Name=shard Count=3 File=/dev/nvidia0 NodeName=ensicompute-11 Name=shard Count=3 File=/dev/nvidia1 NodeName=ensicompute-11 Name=shard Count=3 File=/dev/nvidia2
# ensicompute-12 NodeName=ensicompute-12 Name=gpu Type=Quadro File=/dev/nvidia0 NodeName=ensicompute-12 Name=gpu Type=Quadro File=/dev/nvidia1 NodeName=ensicompute-12 Name=gpu Type=Quadro File=/dev/nvidia2 NodeName=ensicompute-12 Name=shard Count=3 File=/dev/nvidia0 NodeName=ensicompute-12 Name=shard Count=3 File=/dev/nvidia1 NodeName=ensicompute-12 Name=shard Count=3 File=/dev/nvidia2
# ensicompute-13 NodeName=ensicompute-13 Name=gpu Type=A40 File=/dev/nvidia0 NodeName=ensicompute-13 Name=gpu Type=A40 File=/dev/nvidia1 NodeName=ensicompute-13 Name=gpu Type=A40 File=/dev/nvidia2 NodeName=ensicompute-13 Name=shard Count=3 File=/dev/nvidia0 NodeName=ensicompute-13 Name=shard Count=3 File=/dev/nvidia1 NodeName=ensicompute-13 Name=shard Count=3 File=/dev/nvidia2
-- François Broquedis, Ingénieur Service Informatique Grenoble INP - Ensimag, bureau E208 681 rue de la Passerelle BP 72, 38402 Saint Martin d'Hères CEDEX Tél.: +33 (0)4 76 82 72 78