[slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

Fri Jan 19 14:12:24 UTC 2024

+1 on checking the memory allocation.
Or add/check if you have any DefMemPerX set in your slurm.conf

On Fri, Jan 19, 2024 at 12:33 AM mohammed shambakey <shambakey1 at gmail.com>
wrote:

> Hi
>
> I'm not an expert, but is it possible that the currently running jobs is
> consuming the whole node because it is allocated the whole memory of the
> node (so the other 2 jobs had to wait until it finishes)?
> Maybe if you try to restrict the required memory for each job?
>
> Regards
>
> On Thu, Jan 18, 2024 at 4:46 PM Ümit Seren <uemit.seren at gmail.com> wrote:
>
>> This line also has tobe changed:
>>
>>
>> #SBATCH --gpus-per-node=4  #SBATCH --gpus-per-node=1
>>
>> --gpus-per-node seems to be the new parameter that is replacing the  --gres=
>> one, so you can remove the –gres line completely.
>>
>>
>>
>> Best
>>
>> Ümit
>>
>>
>>
>> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
>> Kherfani, Hafedh (Professional Services, TC) <hafedh.kherfani at hpe.com>
>> *Date: *Thursday, 18. January 2024 at 15:40
>> *To: *Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject: *Re: [slurm-users] Need help with running multiple
>> instances/executions of a batch script in parallel (with NVIDIA HGX A100
>> GPU as a Gres)
>>
>> Hi Noam and Matthias,
>>
>>
>>
>> Thanks both for your answers.
>>
>>
>>
>> I changed the “#SBATCH --gres=gpu:4“ directive (in the batch script) with
>> “#SBATCH --gres=gpu:1“ as you suggested, but it didn’t make a difference,
>> as running this batch script 3 times will result in the first job to be in
>> a running state, while the second and third jobs will still be in a pending
>> state …
>>
>>
>>
>> [slurmtest at c-a100-master test-batch-scripts]$ cat gpu-job.sh
>>
>> #!/bin/bash
>>
>> #SBATCH --job-name=gpu-job
>>
>> #SBATCH --partition=gpu
>>
>> #SBATCH --nodes=1
>>
>> #SBATCH --gpus-per-node=4
>>
>> #SBATCH --gres=gpu:1                            # <<<< Changed from ‘4’
>> to ‘1’
>>
>> #SBATCH --tasks-per-node=1
>>
>> #SBATCH --output=gpu_job_output.%j
>>
>> #SBATCH --error=gpu_job_error.%j
>>
>>
>>
>> hostname
>>
>> date
>>
>> sleep 40
>>
>> pwd
>>
>>
>>
>> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *217*
>>
>> [slurmtest at c-a100-master test-batch-scripts]$ squeue
>>
>>              JOBID PARTITION     NAME     USER ST       TIME  NODES
>> NODELIST(REASON)
>>
>>                217       gpu  gpu-job slurmtes  R       0:02      1
>> c-a100-cn01
>>
>> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *218*
>>
>> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *219*
>>
>> [slurmtest at c-a100-master test-batch-scripts]$ squeue
>>
>>              JOBID PARTITION     NAME     USER ST       TIME  NODES
>> NODELIST(REASON)
>>
>>                219       gpu  gpu-job slurmtes *PD*       0:00      1
>> (Priority)
>>
>>                218       gpu  gpu-job slurmtes *PD*       0:00      1
>> (Resources)
>>
>>                217       gpu  gpu-job slurmtes  *R*       0:07      1
>> c-a100-cn01
>>
>>
>>
>> Basically I’m seeking for some help/hints on how to tell Slurm, from the
>> batch script for example: “I want only 1 or 2 GPUs to be used/consumed by
>> the job”, and then I run the batch script/job a couple of times with sbatch
>> command, and confirm that we can indeed have multiple jobs using a GPU and
>> running in parallel, at the same time.
>>
>>
>>
>> Makes sense ?
>>
>>
>>
>>
>>
>> Best regards,
>>
>>
>>
>> *Hafedh *
>>
>>
>>
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf
>> Of *Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
>> *Sent:* jeudi 18 janvier 2024 2:30 PM
>> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>> *Subject:* Re: [slurm-users] Need help with running multiple
>> instances/executions of a batch script in parallel (with NVIDIA HGX A100
>> GPU as a Gres)
>>
>>
>>
>> On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose at mindcode.de> wrote:
>>
>>
>>
>> Hi Hafedh,
>>
>> Im no expert in the GPU side of SLURM, but looking at you current
>> configuration to me its working as intended at the moment. You have defined
>> 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait
>> for the ressource the be free again.
>>
>> I think what you need to look into is the MPS plugin, which seems to do
>> what you are trying to achieve:
>> https://slurm.schedmd.com/gres.html#MPS_Management
>>
>>
>>
>> I agree with the first paragraph.  How many GPUs are you expecting each
>> job to use? I'd have assumed, based on the original text, that each job is
>> supposed to use 1 GPU, and the 4 jobs were supposed to be running
>> side-by-side on the one node you have (with 4 GPUs).  If so, you need to
>> tell each job to request only 1 GPU, and currently each one is requesting 4.
>>
>>
>>
>> If your jobs are actually supposed to be using 4 GPUs each, I still don't
>> see any advantage to MPS (at least in what is my usual GPU usage pattern):
>> all the jobs will take longer to finish, because they are sharing the fixed
>> resource. If they take turns, at least the first ones finish as fast as
>> they can, and the last one will finish no later than it would have if they
>> were all time-sharing the GPUs.  I guess NVIDIA had something in mind when
>> they developed MPS, so I guess our pattern may not be typical (or at least
>> not universal), and in that case the MPS plugin may well be what you need.
>>
>
>
> --
> Mohammed
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240119/7a49a9e7/attachment.htm>