Hello all,
Apologies for the basic question, but is there a straightforward,
best-accepted method for using Slurm to report on which GPUs are currently
in use? I've done some searching and people recommend all sorts of methods,
including parsing the output of nvidia-smi (seems inefficient, especially
across multiple GPU nodes), as well as using other tools such as Grafana,
XDMoD, etc.
We do track GPUs as a resource, so I'd expect I could get at the info with
sreport or something like that, but before trying to craft my own from
scratch, I'm hoping someone has something working already. Ultimately I'd
like to see either which cards are available by node, or the reverse (which
are in use by node). I know recent versions of Slurm supposedly added
tighter integration in some way with NVIDIA cards, but I can't seem to find
definitive docs on what, exactly, changed or what is now possible as a
result.
Warmest regards,
Jason
--
*Jason L. Simms, Ph.D., M.P.H.*
Research Computing Manager
Swarthmore College
Information Technology Services
(610) 328-8102