[slurm-users] slurm issues with CUDA unified memory

Sun Feb 4 14:54:11 MST 2018

Hello, 

I am operating a small cluster of 8 nodes that have 20 cores (2 10-core cpus) and 2 GPUs each (Nvidia K80). To date, I have been successfully running CUDA code where I typically submit single-cpu single-gpu jobs to nodes via slurm with the cons_res and CR_CPU options. 

More recently, I have been trying to use multiple MPI threads to access the a single GPU. The issue I am experiencing is that CUDA appears to reserve all available memory (system + GPU) for each MPI thread:
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                            
 95292 jan         20   0 1115m  14m 9872 R 100.5  0.0   0:02.07 prjmh                                                                    
 95295 jan         20   0 26.3g 145m  95m R 100.5  0.5   0:01.81 prjmh                                                                    
 95293 jan         20   0 26.3g 145m  95m R 98.6  0.5   0:01.80 prjmh                                                                     
 95294 jan         20   0 26.3g 145m  95m R 98.6  0.5   0:01.81 prjmh

Note: PID 95292 is the master which does not access the GPU. The other three processes access the GPU. 

This results in slurm killing the job: 
slurmstepd: Exceeded job memory limit
slurmstepd: Step 5705.0 exceeded virtual memory limit (83806300 > 29491200), being killed
slurmstepd: Step 5705.0 exceeded virtual memory limit (83806300 > 29491200), being killed
slurmstepd: Exceeded job memory limit
slurmstepd: Exceeded job memory limit
slurmstepd: Exceeded job memory limit
srun: got SIGCONT
slurmstepd: *** JOB 5705 CANCELLED AT 2018-02-04T13:47:00 *** on compute-0-3
srun: forcing job termination
srun: error: compute-0-3: task 0: Killed
srun: error: compute-0-3: tasks 1-3: Killed

Note: When I log into the node and manually run the program with mpirun -np=20 a.out, it runs without issues. 

Is there a way to change the configuration of slurm so it does not kill these jobs? I have read through the documentation to some extend but my limited slurm knowledge did not allow me to find a solution. 

Thanks very much, Jan