[slurm-users] Get GPU usage from sacct?

Aaron Jackson aaron at aaronsplace.co.uk
Sat Nov 16 17:16:40 UTC 2019


Janne Blomqvist writes:
> On 14/11/2019 20.41, Prentice Bisbal wrote:
>> Is there any way to see how much a job used the GPU(s) on a cluster
>> using sacct or any other slurm command?
>>
>
> We have created
> https://github.com/AaltoScienceIT/ansible-role-sacct_gpu/ as a quick
> hack to put GPU utilization stats into the comment field at the end of
> the job.
>
> The above is an ansible role, but if you're not using ansible you can
> just pull the scripts from the "files" subdirectory.


I do something similar, but it's optional (on a per-job basis) and
updates regularly. In the job submission script, a user may add

]]    source /usr/share/gpu.sbatch

which contains the following:

]]    (
]]        while true ; do
]]           util=$(nvidia-smi | grep Default | \
]]                  cut -d'|' -f4 | grep -o -P '[0-9]+%' | \
]]                  tr '\n' ' ')
]]           scontrol update job=$SLURM_JOB_ID comment="GPU: $util"
]]           sleep 15
]]        done
]]    ) &

and does shows each GPU's utilisation in the comment field as it is
running. Quite handy. I haven't bothered figuring out how to make this
for all users, and to be honest I think some users would rather not let
everyone know due to embarrassment :-)

Of course, it's not particularly efficient and assumes that the compute
mode is set to Default, but it was a quick hack.

Cheers,
Aaron



More information about the slurm-users mailing list