[slurm-users] GPU process accounting information
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Jan 15 12:41:31 UTC 2021
Hi,
We have installed some new GPU nodes, and now users are asking for some
sort of monitoring of GPU utilisation and GPU memory utilisation at the
end of a job, like what Slurm already provides for CPU and memory usage.
I haven't found any pages describing how to perform GPU accounting within
Slurm, so I would like to ask the user community for some advice on the
best practices and any available (simple) tools out there.
What I have discovered is that Nvidia provides process accounting using
nvidia-smi[1]. It is enabled with
$ nvidia-smi --accounting-mode=1
and queried with
$ nvidia-smi
--query-accounted-apps=gpu_name,pid,time,gpu_util,mem_util,max_memory_usage
--format=csv
but the documentation seems quite scant, and so far I don't see any output
from this query command.
Some questions:
1. Is there a way to integrate the Nvidia process accounting into Slurm?
2. Can users run the above command in the job scripts and get the GPU
accounting information?
Thanks,
Ole
References:
1. https://developer.nvidia.com/nvidia-system-management-interface
More information about the slurm-users
mailing list