<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="margin:0;">Hi threre,</div><div style="margin:0;"><br></div><div style="margin:0;">I was testing the MPS on Slurm19.05.5 with 4 A100 in compute node. In my opinion, the 4 A100 will be used.  But I found that only the first GPU was used. like below:</div><div style="margin:0;"><b>the job script:</b></div><div style="margin:0;"><div style="margin:0;">#!/bin/bash</div><div style="margin:0;">#SBATCH -J date</div><div style="margin:0;">#SBATCH -p NVIDIAA100-PCIE-40GB</div><div style="margin:0;">#SBATCH -n 1</div><div style="margin:0;">#SBATCH --gres=mps:100</div><div style="margin:0;">#SBATCH --mem 1024</div><div style="margin:0;">#SBATCH -o /home/zren/%j.out</div><div style="margin:0;">#SBATCH -e /home/zren/%j.out</div><div style="margin:0;"><br></div><div style="margin:0;">echo $CUDA_VISIBLE_DEVICES</div><div style="margin:0;">echo $CUDA_MPS_ACTIVE_THREAD_PERCENTAGE</div><div style="margin:0;">./vectorAdd</div><div style="margin:0;"><br></div><div style="margin:0;"><b>output of squeue, only one job is running:</b></div></div><div style="margin:0;"><div style="margin:0;">             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)</div><div style="margin:0;">               291 NVIDIAA10     date     zren PD       0:00      1 (Resources)</div><div style="margin:0;">               292 NVIDIAA10     date     zren PD       0:00      1 (Priority)</div><div style="margin:0;">               293 NVIDIAA10     date     zren PD       0:00      1 (Priority)</div><div style="margin:0;">               290 NVIDIAA10     date     zren  R       0:04      1 mig4</div><div style="margin:0;"><br></div></div><div style="margin:0;"><b>output of nvidia-smi, only 0 index GPU was used:</b></div><div style="margin:0;"><div style="margin:0;">Tue Feb 22 09:47:45 2022</div><div style="margin:0;">+-----------------------------------------------------------------------------+</div><div style="margin:0;">| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |</div><div style="margin:0;">|-------------------------------+----------------------+----------------------+</div><div style="margin:0;">| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |</div><div style="margin:0;">| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |</div><div style="margin:0;">|                               |                      |               MIG M. |</div><div style="margin:0;">|===============================+======================+======================|</div><div style="margin:0;">|   0  NVIDIA A100-PCI...  On   | 00000000:18:00.0 Off |                    0 |</div><div style="margin:0;">| N/A   33C    P0    36W / 250W |    415MiB / 40960MiB |     31%      Default |</div><div style="margin:0;">|                               |                      |             Disabled |</div><div style="margin:0;">+-------------------------------+----------------------+----------------------+</div><div style="margin:0;">|   1  NVIDIA A100-PCI...  On   | 00000000:5E:00.0 Off |                    0 |</div><div style="margin:0;">| N/A   30C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |</div><div style="margin:0;">|                               |                      |             Disabled |</div><div style="margin:0;">+-------------------------------+----------------------+----------------------+</div><div style="margin:0;">|   2  NVIDIA A100-PCI...  On   | 00000000:AF:00.0 Off |                    0 |</div><div style="margin:0;">| N/A   28C    P0    32W / 250W |      0MiB / 40960MiB |      0%      Default |</div><div style="margin:0;">|                               |                      |             Disabled |</div><div style="margin:0;">+-------------------------------+----------------------+----------------------+</div><div style="margin:0;">|   3  NVIDIA A100-PCI...  On   | 00000000:D8:00.0 Off |                    0 |</div><div style="margin:0;">| N/A   30C    P0    34W / 250W |      0MiB / 40960MiB |      0%      Default |</div><div style="margin:0;">|                               |                      |             Disabled |</div><div style="margin:0;">+-------------------------------+----------------------+----------------------+</div><div style="margin:0;"><br></div><div style="margin:0;">+-----------------------------------------------------------------------------+</div><div style="margin:0;">| Processes:                                                                  |</div><div style="margin:0;">|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |</div><div style="margin:0;">|        ID   ID                                                   Usage      |</div><div style="margin:0;">|=============================================================================|</div><div style="margin:0;">|    0   N/A  N/A     10228      C   ./vectorAdd                       413MiB |</div><div style="margin:0;">+-----------------------------------------------------------------------------+</div></div><div style="margin:0;"><b>the configuration of slurm.conf and gres.conf:</b></div><div style="margin:0;">NodeName=mig4 CPUs=24 Boards=1 SocketsPerBoard=2 CoresPerSocket=12 ThreadsPerCore=1 RealMemory=191907 MemSpecLimit=10240 Gres=gpu:4,mps:400 State=UNKNOWN</div><div style="margin:0;"><br></div><div style="margin:0;"><div style="margin:0;">AutoDetect=nvml</div><div style="margin:0;">Name=gpu Type=nvidia_a100-pcie-40gb File=/dev/nvidia0</div><div style="margin:0;">Name=gpu Type=nvidia_a100-pcie-40gb File=/dev/nvidia1</div><div style="margin:0;">Name=gpu Type=nvidia_a100-pcie-40gb File=/dev/nvidia2 </div><div style="margin:0;">Name=gpu Type=nvidia_a100-pcie-40gb File=/dev/nvidia3</div><div style="margin:0;">Name=mps Count=400</div><div style="margin:0;"><br></div></div><div style="margin:0;"><b>and some logs for job 291 which is in resources state in slurmctld.log:</b></div><div style="margin:0;"><div style="margin:0;">[2022-02-22T09:47:12.890] debug3: _pick_best_nodes: JobId=291 idle_nodes 0 share_nodes 1</div><div style="margin:0;">[2022-02-22T09:47:12.890] debug2: select/cons_tres: select_p_job_test: evaluating JobId=291</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: select_p_job_test: JobId=291 node_mode:Normal alloc_mode:Run_Now</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: select_p_job_test: node_list:mig4 exc_cores:NONE</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: select_p_job_test: nodes: min:1 max:500000 requested:1 avail:1</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: _job_test: evaluating JobId=291 on 1 nodes</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: _job_test: test 0 fail: insufficient resources</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: select_p_job_test: no job_resources info for JobId=291 rc=-1</div><div style="margin:0;">[2022-02-22T09:47:12.890] debug2: select/cons_tres: select_p_job_test: evaluating JobId=291</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: select_p_job_test: JobId=291 node_mode:Normal alloc_mode:Test_Only</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: select_p_job_test: node_list:mig4 exc_cores:NONE</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: select_p_job_test: nodes: min:1 max:500000 requested:1 avail:1</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: _job_test: evaluating JobId=291 on 1 nodes</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: _can_job_run_on_node: 24 CPUs on mig4(state:1), mem 1024/191907</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: eval_nodes: set:0 consec CPUs:1 nodes:1:mig4 begin:0 end:0 required:-1 weight:511</div><div style="margin:0;">[2022-02-22T09:47:12.890] select/cons_tres: _job_test: test 0 pass: test_only</div><div style="margin:0;">[2022-02-22T09:47:12.890] <i><u>select/cons_tres: select_p_job_test: no job_resources info for JobId=291 rc=0</u></i></div><div style="margin:0;"><i><u><br></u></i></div><div style="margin:0;">thanks</div></div></div><br><br><span title="neteasefooter"><p> </p></span>