[slurm-users] Usage gathering for GPUs

Vecerka Daniel vecerka at fel.cvut.cz
Tue Jun 6 10:42:06 UTC 2023

Hi all,

  I'm trying to get working the gathering of gres/gpumem and 
gres/gpuutil on Slurm 23.02.2 , but with no success yet.

We have:
in the slurm.conf and Slurm is build with NVML support.

in gres.conf

gres/gpumem and gres/gpuutil now appears in sacct TRESUsageInAve record, 
but with zero values:

sacct -j 6056927_51 -Pno TRESUsageInAve


We are using NVIDIA Tesla V100 and A100 GPUs with driver version 
530.30.02. dcgm-exporter is working on the nodes.

Is there anything else needed, to get it working?

Thanks in advanced.    Daniel Vecerka

On 24. 05. 23 21:45, Christopher Samuel wrote:
> On 5/24/23 11:39 am, Fulton, Ben wrote:
>> Hi,
> Hi Ben,
>> The release notes for 23.02 say “Added usage gathering for gpu/nvml 
>> (Nvidia) and gpu/rsmi (AMD) plugins”.
>> How would I go about enabling this?
> I can only comment on the nvidia side (as those are the GPUs we have) 
> but for that you need Slurm built with NVML support and running with 
> "Autodetect=NVML" in gres.conf and then that information is stored in 
> slurmdbd as part of the TRES usage data.
> For example to grab a job step for a test code I ran the other day:
> csamuel at perlmutter:login01:~> sacct -j 9285567.0 -Pno TRESUsageInAve | 
> tr , \\n | fgrep gpu
> gres/gpumem=493120K
> gres/gpuutil=76
> Hope that helps!
> All the best,
> Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230606/e3afc3f6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4340 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230606/e3afc3f6/attachment.bin>

More information about the slurm-users mailing list