[slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
Baer, Troy
troy at osc.edu
Thu Jan 18 14:46:48 UTC 2024
Hi Hafedh,
Your job script has the sbatch directive “—gpus-per-node=4” set. I suspect that if you look at what’s allocated to the running job by doing “scontrol show job <jobid>” and looking at the TRES field, it’s been allocated 4 GPUs instead of one.
Regards,
--Troy
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Kherfani, Hafedh (Professional Services, TC)
Sent: Thursday, January 18, 2024 9:38 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
Hi Noam and Matthias, Thanks both for your answers. I changed the “#SBATCH --gres=gpu: 4“ directive (in the batch script) with “#SBATCH --gres=gpu: 1“ as you suggested, but it didn’t make a difference, as running
Hi Noam and Matthias,
Thanks both for your answers.
I changed the “#SBATCH --gres=gpu:4“ directive (in the batch script) with “#SBATCH --gres=gpu:1“ as you suggested, but it didn’t make a difference, as running this batch script 3 times will result in the first job to be in a running state, while the second and third jobs will still be in a pending state …
[slurmtest at c-a100-master test-batch-scripts]$ cat gpu-job.sh
#!/bin/bash
#SBATCH --job-name=gpu-job
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --gpus-per-node=4
#SBATCH --gres=gpu:1 # <<<< Changed from ‘4’ to ‘1’
#SBATCH --tasks-per-node=1
#SBATCH --output=gpu_job_output.%j
#SBATCH --error=gpu_job_error.%j
hostname
date
sleep 40
pwd
[slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 217
[slurmtest at c-a100-master test-batch-scripts]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
217 gpu gpu-job slurmtes R 0:02 1 c-a100-cn01
[slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 218
[slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 219
[slurmtest at c-a100-master test-batch-scripts]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
219 gpu gpu-job slurmtes PD 0:00 1 (Priority)
218 gpu gpu-job slurmtes PD 0:00 1 (Resources)
217 gpu gpu-job slurmtes R 0:07 1 c-a100-cn01
Basically I’m seeking for some help/hints on how to tell Slurm, from the batch script for example: “I want only 1 or 2 GPUs to be used/consumed by the job”, and then I run the batch script/job a couple of times with sbatch command, and confirm that we can indeed have multiple jobs using a GPU and running in parallel, at the same time.
Makes sense ?
Best regards,
Hafedh
From: slurm-users <slurm-users-bounces at lists.schedmd.com<mailto:slurm-users-bounces at lists.schedmd.com>> On Behalf Of Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Sent: jeudi 18 janvier 2024 2:30 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com<mailto:slurm-users at lists.schedmd.com>>
Subject: Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose at mindcode.de<mailto:m.loose at mindcode.de>> wrote:
Hi Hafedh,
Im no expert in the GPU side of SLURM, but looking at you current configuration to me its working as intended at the moment. You have defined 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait for the ressource the be free again.
I think what you need to look into is the MPS plugin, which seems to do what you are trying to achieve:
https://slurm.schedmd.com/gres.html#MPS_Management<https://urldefense.com/v3/__https:/slurm.schedmd.com/gres.html*MPS_Management__;Iw!!KGKeukY!y8lBvIzVTUcjaJKXNVaSGxEyG-AgFP9NRgOW7uAUJNfWzKHN1Bc9YwXNuwlXGigW0JBn6IzA-XrgVsuHFf2E$>
I agree with the first paragraph. How many GPUs are you expecting each job to use? I'd have assumed, based on the original text, that each job is supposed to use 1 GPU, and the 4 jobs were supposed to be running side-by-side on the one node you have (with 4 GPUs). If so, you need to tell each job to request only 1 GPU, and currently each one is requesting 4.
If your jobs are actually supposed to be using 4 GPUs each, I still don't see any advantage to MPS (at least in what is my usual GPU usage pattern): all the jobs will take longer to finish, because they are sharing the fixed resource. If they take turns, at least the first ones finish as fast as they can, and the last one will finish no later than it would have if they were all time-sharing the GPUs. I guess NVIDIA had something in mind when they developed MPS, so I guess our pattern may not be typical (or at least not universal), and in that case the MPS plugin may well be what you need.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240118/ab9c52a9/attachment-0001.htm>
More information about the slurm-users
mailing list