[slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

Fri Jan 19 14:24:17 UTC 2024

Maybe also post the output of scontrol show job <jobid> to check the other
resources allocated for the job.

On Thu, Jan 18, 2024, 19:22 Kherfani, Hafedh (Professional Services, TC) <
hafedh.kherfani at hpe.com> wrote:

> Hi Ümit, Troy,
>
>
>
> I removed the line “#SBATCH --gres=gpu:1”, and changed the sbatch
> directive “--gpus-per-node=4” to “--gpus-per-node=1”, but still getting the
> same result: When running multiple sbatch commands for the same script,
> only one job (first execution) is running, and all subsequent jobs are in a
> pending state (REASON being reported as “Resources” for immediately next
> job in the queue, and “Priority” for remaining ones) …
>
>
>
> As for the output from “scontrol show job <jobid>” command: I don’t see a “TRES”
> field on its own .. I see the field “TresPerNode=gres/gpu:1” (the value in
> the end f the line will correspond to the value specified in the “--gpus-per-node=”
> directive.
>
>
>
> PS: Is it normal/expected (in the output of scontrol show job command) to
> have “Features=(null)” ? I was expecting to see Features=gpu ….
>
>
>
>
>
> Best regards,
>
>
>
> *Hafedh *
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Baer, Troy
> *Sent:* jeudi 18 janvier 2024 3:47 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Need help with running multiple
> instances/executions of a batch script in parallel (with NVIDIA HGX A100
> GPU as a Gres)
>
>
>
> Hi Hafedh,
>
>
>
> Your job script has the sbatch directive “—gpus-per-node=4” set.  I
> suspect that if you look at what’s allocated to the running job by doing
> “scontrol show job <jobid>” and looking at the TRES field, it’s been
> allocated 4 GPUs instead of one.
>
>
>
> Regards,
>
>                 --Troy
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Kherfani, Hafedh (Professional Services, TC)
> *Sent:* Thursday, January 18, 2024 9:38 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Need help with running multiple
> instances/executions of a batch script in parallel (with NVIDIA HGX A100
> GPU as a Gres)
>
>
>
> Hi Noam and Matthias, Thanks both for your answers. I changed the “#SBATCH
> --gres=gpu: 4“ directive (in the batch script) with “#SBATCH --gres=gpu: 1“
> as you suggested, but it didn’t make a difference, as running
>
>
>
> Hi Noam and Matthias,
>
>
>
> Thanks both for your answers.
>
>
>
> I changed the “#SBATCH --gres=gpu:4“ directive (in the batch script) with
> “#SBATCH --gres=gpu:1“ as you suggested, but it didn’t make a difference,
> as running this batch script 3 times will result in the first job to be in
> a running state, while the second and third jobs will still be in a pending
> state …
>
>
>
> [slurmtest at c-a100-master test-batch-scripts]$ cat gpu-job.sh
>
> #!/bin/bash
>
> #SBATCH --job-name=gpu-job
>
> #SBATCH --partition=gpu
>
> #SBATCH --nodes=1
>
> #SBATCH --gpus-per-node=4
>
> #SBATCH --gres=gpu:1                            # <<<< Changed from ‘4’ to
> ‘1’
>
> #SBATCH --tasks-per-node=1
>
> #SBATCH --output=gpu_job_output.%j
>
> #SBATCH --error=gpu_job_error.%j
>
>
>
> hostname
>
> date
>
> sleep 40
>
> pwd
>
>
>
> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>
> Submitted batch job *217*
>
> [slurmtest at c-a100-master test-batch-scripts]$ squeue
>
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>
>                217       gpu  gpu-job slurmtes  R       0:02      1
> c-a100-cn01
>
> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>
> Submitted batch job *218*
>
> [slurmtest at c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>
> Submitted batch job *219*
>
> [slurmtest at c-a100-master test-batch-scripts]$ squeue
>
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>
>                219       gpu  gpu-job slurmtes *PD*       0:00      1
> (Priority)
>
>                218       gpu  gpu-job slurmtes *PD*       0:00      1
> (Resources)
>
>                217       gpu  gpu-job slurmtes  *R*       0:07      1
> c-a100-cn01
>
>
>
> Basically I’m seeking for some help/hints on how to tell Slurm, from the
> batch script for example: “I want only 1 or 2 GPUs to be used/consumed by
> the job”, and then I run the batch script/job a couple of times with sbatch
> command, and confirm that we can indeed have multiple jobs using a GPU and
> running in parallel, at the same time.
>
>
>
> Makes sense ?
>
>
>
>
>
> Best regards,
>
>
>
> *Hafedh *
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
> *Sent:* jeudi 18 janvier 2024 2:30 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Need help with running multiple
> instances/executions of a batch script in parallel (with NVIDIA HGX A100
> GPU as a Gres)
>
>
>
> On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose at mindcode.de> wrote:
>
>
>
> Hi Hafedh,
>
> Im no expert in the GPU side of SLURM, but looking at you current
> configuration to me its working as intended at the moment. You have defined
> 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait
> for the ressource the be free again.
>
> I think what you need to look into is the MPS plugin, which seems to do
> what you are trying to achieve:
> https://slurm.schedmd.com/gres.html#MPS_Management
>
>
>
> I agree with the first paragraph.  How many GPUs are you expecting each
> job to use? I'd have assumed, based on the original text, that each job is
> supposed to use 1 GPU, and the 4 jobs were supposed to be running
> side-by-side on the one node you have (with 4 GPUs).  If so, you need to
> tell each job to request only 1 GPU, and currently each one is requesting 4.
>
>
>
> If your jobs are actually supposed to be using 4 GPUs each, I still don't
> see any advantage to MPS (at least in what is my usual GPU usage pattern):
> all the jobs will take longer to finish, because they are sharing the fixed
> resource. If they take turns, at least the first ones finish as fast as
> they can, and the last one will finish no later than it would have if they
> were all time-sharing the GPUs.  I guess NVIDIA had something in mind when
> they developed MPS, so I guess our pattern may not be typical (or at least
> not universal), and in that case the MPS plugin may well be what you need.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240119/6968dcff/attachment-0001.htm>