<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<font size="4"><font face="Helvetica, Arial, sans-serif">Hi all,<br>
<br>
I'm trying to get working the gathering of gres/gpumem and
gres/gpuutil on Slurm 23.02.2 , but with no success yet.<br>
<br>
We have:<br>
AccountingStorageTRES=cpu,mem,gres/gpu<br>
</font></font><font size="4"><font face="Helvetica, Arial,
sans-serif">in the slurm.conf and Slurm is build with NVML
support.</font></font><br>
<font size="4"><font face="Helvetica, Arial, sans-serif"><br>
Autodetect=NVML <br>
in gres.conf<br>
<br>
</font></font><font size="4"><font face="Helvetica, Arial,
sans-serif">gres/gpumem and </font></font><font size="4"><font
face="Helvetica, Arial, sans-serif">gres/gpuutil now appears in
sacct </font></font><font size="4"><font face="Helvetica,
Arial, sans-serif">TRESUsageInAve record, but with zero values:<br>
<br>
sacct -j 6056927_51 -Pno TRESUsageInAve<br>
<br>
cpu=00:00:07,energy=0,fs/disk=14073059,gres/gpumem=0,gres/gpuutil=0,mem=6456K,pages=0,vmem=7052K<br>
cpu=00:00:00,energy=0,fs/disk=2332,gres/gpumem=0,gres/gpuutil=0,mem=44K,pages=0,vmem=44K<br>
cpu=05:18:51,energy=0,fs/disk=708800,gres/gpumem=0,gres/gpuutil=0,mem=2565376K,pages=0,vmem=2961244K<br>
<br>
We are using NVIDIA Tesla V100 and A100 GPUs with driver version
530.30.02. dcgm-exporter is working on the nodes.<br>
<br>
Is there anything else needed, to get it working?<br>
<br>
Thanks in advanced. Daniel Vecerka<br>
<br>
<br>
</font></font>
<div class="moz-cite-prefix">On 24. 05. 23 21:45, Christopher Samuel
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:ea109f58-d2d2-03bb-2556-7ac779094074@csamuel.org">On
5/24/23 11:39 am, Fulton, Ben wrote:
<br>
<br>
<blockquote type="cite">Hi,
<br>
</blockquote>
<br>
Hi Ben,
<br>
<br>
<blockquote type="cite">The release notes for 23.02 say “Added
usage gathering for gpu/nvml (Nvidia) and gpu/rsmi (AMD)
plugins”.
<br>
<br>
How would I go about enabling this?
<br>
</blockquote>
<br>
I can only comment on the nvidia side (as those are the GPUs we
have) but for that you need Slurm built with NVML support and
running with "Autodetect=NVML" in gres.conf and then that
information is stored in slurmdbd as part of the TRES usage data.
<br>
<br>
For example to grab a job step for a test code I ran the other
day:
<br>
<br>
csamuel@perlmutter:login01:~> sacct -j 9285567.0 -Pno
TRESUsageInAve | tr , \\n | fgrep gpu
<br>
gres/gpumem=493120K
<br>
gres/gpuutil=76
<br>
<br>
Hope that helps!
<br>
<br>
All the best,
<br>
Chris
<br>
</blockquote>
<br>
</body>
</html>