Hello all,
Apologies for the basic question, but is there a straightforward, best-accepted method for using Slurm to report on which GPUs are currently in use? I've done some searching and people recommend all sorts of methods, including parsing the output of nvidia-smi (seems inefficient, especially across multiple GPU nodes), as well as using other tools such as Grafana, XDMoD, etc.
We do track GPUs as a resource, so I'd expect I could get at the info with sreport or something like that, but before trying to craft my own from scratch, I'm hoping someone has something working already. Ultimately I'd like to see either which cards are available by node, or the reverse (which are in use by node). I know recent versions of Slurm supposedly added tighter integration in some way with NVIDIA cards, but I can't seem to find definitive docs on what, exactly, changed or what is now possible as a result.
Warmest regards,
Jason
--
Jason L. Simms, Ph.D., M.P.H.
Research Computing ManagerSwarthmore College
Information Technology Services
(610) 328-8102