[slurm-users] GPU Jobs with Slurm

Loris Bennett loris.bennett at fu-berlin.de
Fri Jan 15 07:11:20 UTC 2021


Hi Abhiram,

Glad to help, but it turns out I was wrong :-)

We also didn't have ConstrainDevices=yes set, so nvidia-smi always
showed all the GPUs.

Thanks to Ryan and Samuel for putting me straight on that.  

Regards

Loris

Abhiram Chintangal <achintangal at berkeley.edu> writes:

> Loris, 
>
> You are correct! Instead of using nvidia-smi as a check, I confirmed the GPU allocation by printing out 
> the environment variable, CUDA_VISIBILE_DEVICES, and it was as expected. 
>
> Thanks for your help! 
>
> On Thu, Jan 14, 2021 at 12:18 AM Loris Bennett <loris.bennett at fu-berlin.de> wrote:
>
>  Hi Abhiram,
>
>  Abhiram Chintangal <achintangal at berkeley.edu> writes:
>
>  > Hello, 
>  >
>  > I recently set up a small cluster at work using Warewulf/Slurm. Currently, I am not able to get the scheduler to 
>  > work well with GPU's (Gres). 
>  >
>  > While slurm is able to filter by GPU type, it allocates all the GPU's on the node. See below:
>  >
>  >  [abhiram at whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv
>  >  index, name
>  >  0, Tesla P100-PCIE-16GB
>  >  1, Tesla P100-PCIE-16GB
>  >  2, Tesla P100-PCIE-16GB
>  >  3, Tesla P100-PCIE-16GB
>  >  [abhiram at whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv
>  >  index, name
>  >  0, TITAN RTX
>  >  1, TITAN RTX
>  >  2, TITAN RTX
>  >  3, TITAN RTX
>  >  4, TITAN RTX
>  >  5, TITAN RTX
>  >  6, TITAN RTX
>  >  7, TITAN RTX
>  >
>  > I am fairly new to Slurm and still figuring out my way around it. I would really appreciate any help with this.
>  >
>  > For your reference, I attached the slurm.conf and gres.conf files. 
>
>  I think this is expected, since nvidia-smi does not actually use the
>  GPUs, but just returns information on their usage.
>
>  A better test would be to run a simple test which really does run on,
>  say, two GPU and then, while the job is running, log into the GPU node
>  and run 
>
>    nvidia-smi --query-gpu=index,name,utilization.gpu --format=csv
>
>  Cheers,
>
>  Loris
>
>  -- 
>  Dr. Loris Bennett (Hr./Mr.)
>  ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de
-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list