[slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

Tue Jan 23 08:59:48 UTC 2024

Also, remembre to specify the memory used by the job if you treat it as 
a TRES if you're using CR_*Memory to select resources.

Diego

Il 18/01/2024 15:44, Ümit Seren ha scritto:
> This line also has tobe changed:
> 
> 
> #SBATCH --gpus-per-node=4#SBATCH --gpus-per-node=1
> 
> --gpus-per-nodeseems to be the new parameter that is replacing the 
> --gres= one, so you can remove the –gres line completely.
> 
> Best
> 
> Ümit
> 
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of 
> Kherfani, Hafedh (Professional Services, TC) <hafedh.kherfani at hpe.com>
> *Date: *Thursday, 18. January 2024 at 15:40
> *To: *Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject: *Re: [slurm-users] Need help with running multiple 
> instances/executions of a batch script in parallel (with NVIDIA HGX A100 
> GPU as a Gres)
> 
> Hi Noam and Matthias,
> 
> Thanks both for your answers.
> 
> I changed the “#SBATCH --gres=gpu:4“ directive (in the batch script) 
> with “#SBATCH --gres=gpu:1“ as you suggested, but it didn’t make a 
> difference, as running this batch script 3 times will result in the 
> first job to be in a running state, while the second and third jobs will 
> still be in a pending state …
> 
> [slurmtest at c-a100-master test-batch-scripts]$ cat gpu-job.sh
> 
> #!/bin/bash
> 
> #SBATCH --job-name=gpu-job
> 
> #SBATCH --partition=gpu
> 
> #SBATCH --nodes=1
> 
> #SBATCH --gpus-per-node=4
> 
> #SBATCH --gres=gpu:1                            # <<<< Changed from ‘4’ 
> to ‘1’
> 
> #SBATCH --tasks-per-node=1
> 
> #SBATCH --output=gpu_job_output.%j
> 
> #SBATCH --error=gpu_job_error.%j
> 
> hostname
> 
> date
> 
> sleep 40
> 
> pwd
> 
> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
> 
> Submitted batch job *217*
> 
> [slurmtest at c-a100-master test-batch-scripts]$ squeue
> 
>               JOBID PARTITION     NAME     USER ST       TIME  NODES 
> NODELIST(REASON)
> 
>                 217       gpu  gpu-job slurmtes  R       0:02      1 
> c-a100-cn01
> 
> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
> 
> Submitted batch job *218*
> 
> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
> 
> Submitted batch job *219*
> 
> [slurmtest at c-a100-master test-batch-scripts]$ squeue
> 
>               JOBID PARTITION     NAME     USER ST       TIME  NODES 
> NODELIST(REASON)
> 
>                 219       gpu  gpu-job slurmtes *PD*       0:00      1 
> (Priority)
> 
>                 218       gpu  gpu-job slurmtes *PD*       0:00      1 
> (Resources)
> 
>                 217       gpu  gpu-job slurmtes *R*       0:07      1 
> c-a100-cn01
> 
> Basically I’m seeking for some help/hints on how to tell Slurm, from the 
> batch script for example: “I want only 1 or 2 GPUs to be used/consumed 
> by the job”, and then I run the batch script/job a couple of times with 
> sbatch command, and confirm that we can indeed have multiple jobs using 
> a GPU and running in parallel, at the same time.
> 
> Makes sense ?
> 
> Best regards,
> 
> **
> 
> *Hafedh *
> 
> *From:*slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of 
> *Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
> *Sent:* jeudi 18 janvier 2024 2:30 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Need help with running multiple 
> instances/executions of a batch script in parallel (with NVIDIA HGX A100 
> GPU as a Gres)
> 
>     On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose at mindcode.de
>     <mailto:m.loose at mindcode.de>> wrote:
> 
>     Hi Hafedh,
> 
>     Im no expert in the GPU side of SLURM, but looking at you current
>     configuration to me its working as intended at the moment. You have
>     defined 4 GPUs and start multiple jobs each consuming 4 GPUs each.
>     So the jobs wait for the ressource the be free again.
> 
>     I think what you need to look into is the MPS plugin, which seems to
>     do what you are trying to achieve:
>     https://slurm.schedmd.com/gres.html#MPS_Management
>     <https://slurm.schedmd.com/gres.html#MPS_Management>
> 
> I agree with the first paragraph.  How many GPUs are you expecting each 
> job to use? I'd have assumed, based on the original text, that each job 
> is supposed to use 1 GPU, and the 4 jobs were supposed to be running 
> side-by-side on the one node you have (with 4 GPUs).  If so, you need to 
> tell each job to request only 1 GPU, and currently each one is requesting 4.
> 
> If your jobs are actually supposed to be using 4 GPUs each, I still 
> don't see any advantage to MPS (at least in what is my usual GPU usage 
> pattern): all the jobs will take longer to finish, because they are 
> sharing the fixed resource. If they take turns, at least the first ones 
> finish as fast as they can, and the last one will finish no later than 
> it would have if they were all time-sharing the GPUs.  I guess NVIDIA 
> had something in mind when they developed MPS, so I guess our pattern 
> may not be typical (or at least not universal), and in that case the MPS 
> plugin may well be what you need.
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786