[slurm-users] GPU process accounting information
Michael Di Domenico
mdidomenico4 at gmail.com
Fri Jan 15 14:44:08 UTC 2021
i would imagine that slurm should be able to pull that data through
nvml. but i'd bet the hooks aren't inplace.
On Fri, Jan 15, 2021 at 7:44 AM Ole Holm Nielsen
<Ole.H.Nielsen at fysik.dtu.dk> wrote:
>
> Hi,
>
> We have installed some new GPU nodes, and now users are asking for some
> sort of monitoring of GPU utilisation and GPU memory utilisation at the
> end of a job, like what Slurm already provides for CPU and memory usage.
>
> I haven't found any pages describing how to perform GPU accounting within
> Slurm, so I would like to ask the user community for some advice on the
> best practices and any available (simple) tools out there.
>
> What I have discovered is that Nvidia provides process accounting using
> nvidia-smi[1]. It is enabled with
>
> $ nvidia-smi --accounting-mode=1
>
> and queried with
>
> $ nvidia-smi
> --query-accounted-apps=gpu_name,pid,time,gpu_util,mem_util,max_memory_usage
> --format=csv
>
> but the documentation seems quite scant, and so far I don't see any output
> from this query command.
>
> Some questions:
>
> 1. Is there a way to integrate the Nvidia process accounting into Slurm?
>
> 2. Can users run the above command in the job scripts and get the GPU
> accounting information?
>
> Thanks,
> Ole
>
> References:
> 1. https://developer.nvidia.com/nvidia-system-management-interface
>
More information about the slurm-users
mailing list