<!DOCTYPE html>

<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi all,</p>

    <p>Happy new year everyone!</p>

    <p>I've been looking for a simple tool that reports how much

      resources are actually consumed by a job to help my colleagues and

      I adjust job requirements. I could not find such a tool, or the

      ones mentioned on this ML were not easy to install and use, so I

      have written a new one: <a class="moz-txt-link-freetext" href="https://github.com/CEA-LIST/sprofile">https://github.com/CEA-LIST/sprofile</a></p>

    <p>It's a simple python script which parses cgroup and nvml data

      from the nvidia driver. It reports duration, cpu load, peak RAM,

      GPU load and peak GPU memory like so:<br>

    </p>

    <pre class="notranslate"><code>-- sprofile report (node03) --

Time:       0:00:03  /  1:00:00

CPU load:       2.0  /   4.0

RAM peak mem:    7G  /    8G

GPU load:       0.2  /   2.0

GPU peak mem:    7G  /   40G</code></pre>

    <p></p>

    <p>The requirements are to use the slurm cgroup plugin and to enable

      accounting on the GPU (nvidia-smi --accounting-mode=1).</p>

    <p>I hope you find this useful and let me know I you find bugs or

      want to contribute.</p>

    <p>Regards,<br>

      Nicolas Granger<br>

    </p>

  </body>

</html>