[slurm-users] GPU Jobs with Slurm
Loris Bennett
loris.bennett at fu-berlin.de
Thu Jan 14 08:16:30 UTC 2021
Hi Abhiram,
Abhiram Chintangal <achintangal at berkeley.edu> writes:
> Hello,
>
> I recently set up a small cluster at work using Warewulf/Slurm. Currently, I am not able to get the scheduler to
> work well with GPU's (Gres).
>
> While slurm is able to filter by GPU type, it allocates all the GPU's on the node. See below:
>
> [abhiram at whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv
> index, name
> 0, Tesla P100-PCIE-16GB
> 1, Tesla P100-PCIE-16GB
> 2, Tesla P100-PCIE-16GB
> 3, Tesla P100-PCIE-16GB
> [abhiram at whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv
> index, name
> 0, TITAN RTX
> 1, TITAN RTX
> 2, TITAN RTX
> 3, TITAN RTX
> 4, TITAN RTX
> 5, TITAN RTX
> 6, TITAN RTX
> 7, TITAN RTX
>
> I am fairly new to Slurm and still figuring out my way around it. I would really appreciate any help with this.
>
> For your reference, I attached the slurm.conf and gres.conf files.
I think this is expected, since nvidia-smi does not actually use the
GPUs, but just returns information on their usage.
A better test would be to run a simple test which really does run on,
say, two GPU and then, while the job is running, log into the GPU node
and run
nvidia-smi --query-gpu=index,name,utilization.gpu --format=csv
Cheers,
Loris
--
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list