[slurm-users] Problem with squeue reporting of GPUs in use

Yair Yarom irush at cs.huji.ac.il
Tue Feb 25 09:02:32 UTC 2020


Hi,

I've also encountered this issue of the deprecated %b. I'm currently
parsing the output of "scontrol show jobs -dd" to see what was requested
(and which exact GPUs were allocated).

Hope this helps,
    Yair.

On Mon, Feb 24, 2020 at 11:56 PM Venable, Richard (NIH/NHLBI) [E] <
venabler at nhlbi.nih.gov> wrote:

> I’m seeing a problem with GPU usage reporting via squeue in the 19.05.3
> release.
>
>
>
> I’ve been using a custom script to track GPUs in use, and had been relying
> on the ‘%b’ field of squeue -o formatting (which now seems to be
> undocumented) to capture usage requested via --gres option of sbatch.
> Unfortunately, besides apparently being deprecated, ‘%b’ does not report
> usage requested via the new --gpus option.
>
>
>
> I’ve tried several squeue -O option fields, but only ‘tres-alloc’ seems to
> consistently report GPU usage, independent of which sbatch option was used
> for the request.  The ‘tres-per-node’ field only reports usage requested by
> --gres, while ‘tres-per-job’ only reports usage requested by the  --gpus
> option.  Also, the -O formatting doesn’t put a single space between fields,
> a problem for longer job names or usernames, and messes up the field
> parsing of the output when two fields are run together.
>
>
>
> Our users like to know which partition has the most free GPUs, and right
> now my script is broken wrt. usage via the --gpus option.
>
>
>
> If there is no other option, I can probably parse the ‘tres-alloc’ field
> (it has more info than I need), but I’m looking for alternatives, or any
> information that might indicate the ‘tres-*’ fields are more consistent in
> the newer (.4 or .5) SLURM releases.
>
>
>
>
>
> BTW, sreport does a bad job of reporting GPU usage as well, in that the
> GRES/GPU total % for root in the account listing on a given cluster is
> always less than the % allocated in the utilization listing, sometime by a
> substantial amount.  The CPU usage is almost always the same in both
> sreport listings.
>
>
>
>
>
> --
>
> *Rick Venable*
>
> NIH/NHLBI/DIR/BBC
>
> Lab. of Membrane Biophysics MSC 5690
>
> Bldg. 12A Room 3053L
>
> Bethesda, MD  20892-5690   U.S.A.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200225/2de0bc3c/attachment.htm>


More information about the slurm-users mailing list