[slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Dec 14 07:34:51 UTC 2021


The latest pestat version now adds a red color highlight if the GRES GPU 
is the (null) value.

We use this to highlight jobs on GPU nodes which didn't request any GPU 
resources, thereby possibly wasting resources.

Could you test if this is useful and give me a feedback?

Thanks,
Ole

On 12/13/21 15:31, Loris Bennett wrote:
> Hi Ole,
> 
> The new version looks good to me.
> 
> Cheers,
> 
> Loris
> 
> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
> 
>> Hi Loris,
>>
>> I fixed errors in the hostnamelength calculation and formatting.
>> Could you grab the latest pestat and test it?
>>
>> Thanks,
>> Ole
>>
>> On 12/13/21 13:56, Loris Bennett wrote:
>>> Hi Ole,
>>>
>>> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
>>>
>>>> Hi Slurm users,
>>>>
>>>> I have updated the "pestat" tool for printing Slurm nodes status with 1 line per
>>>> node including job info.  The download page is
>>>> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
>>>> (also listed in https://slurm.schedmd.com/download.html).
>>>>
>>>> Improvements:
>>>>
>>>> * The GRES/GPU output option "pestat -G" now prints the job gres/gpu information
>>>> as obtained from squeue's tres-alloc output option, which should contain the
>>>> most correct GRES/GPU information.
>>>>
>>>> If you have a cluster with GPUs, could you try out the latest version and send
>>>> me any feedback?
>>>>
>>>> Thanks to René Sitt for helpful suggestions and testing.
>>>>
>>>> The pestat tool can print a large variety of node and job information, and is
>>>> generally useful for monitoring nodes and jobs on Slurm clusters.  For command
>>>> options and examples please see the download page.  My own favorite usage is
>>>> "pestat -F".
>>>
>>> Thanks for the update - the GPU information is a good addition.
>>> However, the alignment of the columns with the headers seems a bit off:
>>>
>>>
>>> $ pestat -p gpu -G
>>> Print only nodes in partition gpu
>>> GRES (Generic Resource) is printed after each jobid
>>> Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  GRES/node              Joblist
>>>                       State Use/Tot  (15min)     (MB)     (MB)                         JobID(JobArrayID) User GRES/job ...
>>> g001             gpu      mix   1  32    0.06*    95200    89990  gpu:gtx1080ti:2(S:0-1) 8692106 joesnow gpu=2
>>> g002             gpu      mix   6  32    1.70*    95200    71692  gpu:gtx1080ti:2(S:0-1) 8692181(8536946_566) gailhail gpu=1 8692131(8536946_563) gailhail gpu=1
>>> g003             gpu      mix   1  32    0.06*    95200    87622  gpu:gtx1080ti:2(S:0-1) 8692111 joesnow gpu=2
>>> g004             gpu      mix   6  32    1.74*    95200    65647  gpu:gtx1080ti:2(S:0-1) 8692124(8536946_562) gailhail gpu=1 8692122(8536946_561) gailhail gpu=1
>>>
>>>
>>> It looks as if the column 'Partition' needs to be four spaces wider.
>>>
>>> Cheers,
>>>
>>> Loris
>>>

-- 
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620



More information about the slurm-users mailing list