[slurm-users] Problem with squeue reporting of GPUs in use

Venable, Richard (NIH/NHLBI) [E] venabler at nhlbi.nih.gov
Mon Feb 24 21:51:30 UTC 2020


I’m seeing a problem with GPU usage reporting via squeue in the 19.05.3 release.

I’ve been using a custom script to track GPUs in use, and had been relying on the ‘%b’ field of squeue -o formatting (which now seems to be undocumented) to capture usage requested via --gres option of sbatch.  Unfortunately, besides apparently being deprecated, ‘%b’ does not report usage requested via the new --gpus option.

I’ve tried several squeue -O option fields, but only ‘tres-alloc’ seems to consistently report GPU usage, independent of which sbatch option was used for the request.  The ‘tres-per-node’ field only reports usage requested by --gres, while ‘tres-per-job’ only reports usage requested by the  --gpus option.  Also, the -O formatting doesn’t put a single space between fields, a problem for longer job names or usernames, and messes up the field parsing of the output when two fields are run together.

Our users like to know which partition has the most free GPUs, and right now my script is broken wrt. usage via the --gpus option.

If there is no other option, I can probably parse the ‘tres-alloc’ field (it has more info than I need), but I’m looking for alternatives, or any information that might indicate the ‘tres-*’ fields are more consistent in the newer (.4 or .5) SLURM releases.


BTW, sreport does a bad job of reporting GPU usage as well, in that the GRES/GPU total % for root in the account listing on a given cluster is always less than the % allocated in the utilization listing, sometime by a substantial amount.  The CPU usage is almost always the same in both sreport listings.


--
Rick Venable
NIH/NHLBI/DIR/BBC
Lab. of Membrane Biophysics MSC 5690
Bldg. 12A Room 3053L
Bethesda, MD  20892-5690   U.S.A.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200224/d171c7dd/attachment.htm>


More information about the slurm-users mailing list