<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi all,</p>
<p>Happy new year everyone!</p>
<p>I've been looking for a simple tool that reports how much
resources are actually consumed by a job to help my colleagues and
I adjust job requirements. I could not find such a tool, or the
ones mentioned on this ML were not easy to install and use, so I
have written a new one: <a class="moz-txt-link-freetext" href="https://github.com/CEA-LIST/sprofile">https://github.com/CEA-LIST/sprofile</a></p>
<p>It's a simple python script which parses cgroup and nvml data
from the nvidia driver. It reports duration, cpu load, peak RAM,
GPU load and peak GPU memory like so:<br>
</p>
<pre class="notranslate"><code>-- sprofile report (node03) --
Time: 0:00:03 / 1:00:00
CPU load: 2.0 / 4.0
RAM peak mem: 7G / 8G
GPU load: 0.2 / 2.0
GPU peak mem: 7G / 40G</code></pre>
<p></p>
<p>The requirements are to use the slurm cgroup plugin and to enable
accounting on the GPU (nvidia-smi --accounting-mode=1).</p>
<p>I hope you find this useful and let me know I you find bugs or
want to contribute.</p>
<p>Regards,<br>
Nicolas Granger<br>
</p>
</body>
</html>