[slurm-users] slurm, gres:gpu, only 1 GPU out of 4 is detected
Tamas Hegedus
tamas at hegelab.org
Wed Nov 13 18:11:30 UTC 2019
Thanks for your suggestion. You are right, I do not have to deal with
specific GPUs.
(I have not tried to compile your code, I simply tested two gromacs runs
on the same node with -gres=gpu:1 options.)
On 11/13/19 5:17 PM, Renfro, Michael wrote:
> Pretty sure you don’t need to explicitly specify GPU IDs on a Gromacs job running inside of Slurm with gres=gpu. Gromacs should only see the GPUs you have reserved for that job.
>
> Here’s a verification code you can run to verify that two different GPU jobs see different GPU devices (compile with nvcc):
>
> =====
>
> // From http://www.cs.fsu.edu/~xyuan/cda5125/examples/lect24/devicequery.cu
> #include <stdio.h>
> void printDevProp(cudaDeviceProp dP)
> {
> printf("%s has %d multiprocessors\n", dP.name, dP.multiProcessorCount);
> printf("%s has PCI BusID %d, DeviceID %d\n", dP.name, dP.pciBusID, dP.pciDeviceID);
> }
> int main()
> {
> // Number of CUDA devices
> int devCount; cudaGetDeviceCount(&devCount);
> printf("There are %d CUDA devices.\n", devCount);
> // Iterate through devices
> for (int i = 0; i < devCount; ++i)
> {
> // Get device properties
> printf("CUDA Device #%d: ", i);
> cudaDeviceProp devProp; cudaGetDeviceProperties(&devProp, i);
> printDevProp(devProp);
> }
> return 0;
> }
>
> =====
>
> When run from two simultaneous jobs on the same node (each with a gres=gpu), I get:
>
> =====
>
> [renfro at gpunode003(job 221584) hw]$ ./cuda_props
> There are 1 CUDA devices.
> CUDA Device #0: Tesla K80 has 13 multiprocessors
> Tesla K80 has PCI BusID 5, DeviceID 0
>
> =====
>
> [renfro at gpunode003(job 221585) hw]$ ./cuda_props
> There are 1 CUDA devices.
> CUDA Device #0: Tesla K80 has 13 multiprocessors
> Tesla K80 has PCI BusID 6, DeviceID 0
>
> =====
>
--
Tamas Hegedus, PhD
Senior Research Fellow
Department of Biophysics and Radiation Biology
Semmelweis University | phone: (36) 1-459 1500/60233
Tuzolto utca 37-47 | mailto:tamas at hegelab.org
Budapest, 1094, Hungary | http://www.hegelab.org
More information about the slurm-users
mailing list