[slurm-users] GPU process accounting information

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Jan 15 12:41:31 UTC 2021


Hi,

We have installed some new GPU nodes, and now users are asking for some 
sort of monitoring of GPU utilisation and GPU memory utilisation at the 
end of a job, like what Slurm already provides for CPU and memory usage.

I haven't found any pages describing how to perform GPU accounting within 
Slurm, so I would like to ask the user community for some advice on the 
best practices and any available (simple) tools out there.

What I have discovered is that Nvidia provides process accounting using 
nvidia-smi[1].  It is enabled with

$ nvidia-smi --accounting-mode=1

and queried with

$ nvidia-smi 
--query-accounted-apps=gpu_name,pid,time,gpu_util,mem_util,max_memory_usage 
--format=csv

but the documentation seems quite scant, and so far I don't see any output 
from this query command.

Some questions:

1. Is there a way to integrate the Nvidia process accounting into Slurm?

2. Can users run the above command in the job scripts and get the GPU 
accounting information?

Thanks,
Ole

References:
1. https://developer.nvidia.com/nvidia-system-management-interface



More information about the slurm-users mailing list