[slurm-users] GPU Jobs with Slurm
Loris Bennett
loris.bennett at fu-berlin.de
Fri Jan 15 07:11:20 UTC 2021
Hi Abhiram,
Glad to help, but it turns out I was wrong :-)
We also didn't have ConstrainDevices=yes set, so nvidia-smi always
showed all the GPUs.
Thanks to Ryan and Samuel for putting me straight on that.
Regards
Loris
Abhiram Chintangal <achintangal at berkeley.edu> writes:
> Loris,
>
> You are correct! Instead of using nvidia-smi as a check, I confirmed the GPU allocation by printing out
> the environment variable, CUDA_VISIBILE_DEVICES, and it was as expected.
>
> Thanks for your help!
>
> On Thu, Jan 14, 2021 at 12:18 AM Loris Bennett <loris.bennett at fu-berlin.de> wrote:
>
> Hi Abhiram,
>
> Abhiram Chintangal <achintangal at berkeley.edu> writes:
>
> > Hello,
> >
> > I recently set up a small cluster at work using Warewulf/Slurm. Currently, I am not able to get the scheduler to
> > work well with GPU's (Gres).
> >
> > While slurm is able to filter by GPU type, it allocates all the GPU's on the node. See below:
> >
> > [abhiram at whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv
> > index, name
> > 0, Tesla P100-PCIE-16GB
> > 1, Tesla P100-PCIE-16GB
> > 2, Tesla P100-PCIE-16GB
> > 3, Tesla P100-PCIE-16GB
> > [abhiram at whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv
> > index, name
> > 0, TITAN RTX
> > 1, TITAN RTX
> > 2, TITAN RTX
> > 3, TITAN RTX
> > 4, TITAN RTX
> > 5, TITAN RTX
> > 6, TITAN RTX
> > 7, TITAN RTX
> >
> > I am fairly new to Slurm and still figuring out my way around it. I would really appreciate any help with this.
> >
> > For your reference, I attached the slurm.conf and gres.conf files.
>
> I think this is expected, since nvidia-smi does not actually use the
> GPUs, but just returns information on their usage.
>
> A better test would be to run a simple test which really does run on,
> say, two GPU and then, while the job is running, log into the GPU node
> and run
>
> nvidia-smi --query-gpu=index,name,utilization.gpu --format=csv
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Hr./Mr.)
> ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
--
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list