Hi all,

Happy new year everyone!

I've been looking for a simple tool that reports how much resources are actually consumed by a job to help my colleagues and I adjust job requirements. I could not find such a tool, or the ones mentioned on this ML were not easy to install and use, so I have written a new one: https://github.com/CEA-LIST/sprofile

It's a simple python script which parses cgroup and nvml data from the nvidia driver. It reports duration, cpu load, peak RAM, GPU load and peak GPU memory like so:

-- sprofile report (node03) --
Time:       0:00:03  /  1:00:00
CPU load:       2.0  /   4.0
RAM peak mem:    7G  /    8G
GPU load:       0.2  /   2.0
GPU peak mem:    7G  /   40G

The requirements are to use the slurm cgroup plugin and to enable accounting on the GPU (nvidia-smi --accounting-mode=1).

I hope you find this useful and let me know I you find bugs or want to contribute.

Regards,
Nicolas Granger