[slurm-users] Get GPU usage from sacct?
Aaron Jackson
aaron at aaronsplace.co.uk
Sat Nov 16 17:16:40 UTC 2019
Janne Blomqvist writes:
> On 14/11/2019 20.41, Prentice Bisbal wrote:
>> Is there any way to see how much a job used the GPU(s) on a cluster
>> using sacct or any other slurm command?
>>
>
> We have created
> https://github.com/AaltoScienceIT/ansible-role-sacct_gpu/ as a quick
> hack to put GPU utilization stats into the comment field at the end of
> the job.
>
> The above is an ansible role, but if you're not using ansible you can
> just pull the scripts from the "files" subdirectory.
I do something similar, but it's optional (on a per-job basis) and
updates regularly. In the job submission script, a user may add
]] source /usr/share/gpu.sbatch
which contains the following:
]] (
]] while true ; do
]] util=$(nvidia-smi | grep Default | \
]] cut -d'|' -f4 | grep -o -P '[0-9]+%' | \
]] tr '\n' ' ')
]] scontrol update job=$SLURM_JOB_ID comment="GPU: $util"
]] sleep 15
]] done
]] ) &
and does shows each GPU's utilisation in the comment field as it is
running. Quite handy. I haven't bothered figuring out how to make this
for all users, and to be honest I think some users would rather not let
everyone know due to embarrassment :-)
Of course, it's not particularly efficient and assumes that the compute
mode is set to Default, but it was a quick hack.
Cheers,
Aaron
More information about the slurm-users
mailing list