[slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Dec 14 07:34:51 UTC 2021
The latest pestat version now adds a red color highlight if the GRES GPU
is the (null) value.
We use this to highlight jobs on GPU nodes which didn't request any GPU
resources, thereby possibly wasting resources.
Could you test if this is useful and give me a feedback?
Thanks,
Ole
On 12/13/21 15:31, Loris Bennett wrote:
> Hi Ole,
>
> The new version looks good to me.
>
> Cheers,
>
> Loris
>
> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
>
>> Hi Loris,
>>
>> I fixed errors in the hostnamelength calculation and formatting.
>> Could you grab the latest pestat and test it?
>>
>> Thanks,
>> Ole
>>
>> On 12/13/21 13:56, Loris Bennett wrote:
>>> Hi Ole,
>>>
>>> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
>>>
>>>> Hi Slurm users,
>>>>
>>>> I have updated the "pestat" tool for printing Slurm nodes status with 1 line per
>>>> node including job info. The download page is
>>>> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
>>>> (also listed in https://slurm.schedmd.com/download.html).
>>>>
>>>> Improvements:
>>>>
>>>> * The GRES/GPU output option "pestat -G" now prints the job gres/gpu information
>>>> as obtained from squeue's tres-alloc output option, which should contain the
>>>> most correct GRES/GPU information.
>>>>
>>>> If you have a cluster with GPUs, could you try out the latest version and send
>>>> me any feedback?
>>>>
>>>> Thanks to René Sitt for helpful suggestions and testing.
>>>>
>>>> The pestat tool can print a large variety of node and job information, and is
>>>> generally useful for monitoring nodes and jobs on Slurm clusters. For command
>>>> options and examples please see the download page. My own favorite usage is
>>>> "pestat -F".
>>>
>>> Thanks for the update - the GPU information is a good addition.
>>> However, the alignment of the columns with the headers seems a bit off:
>>>
>>>
>>> $ pestat -p gpu -G
>>> Print only nodes in partition gpu
>>> GRES (Generic Resource) is printed after each jobid
>>> Hostname Partition Node Num_CPU CPUload Memsize Freemem GRES/node Joblist
>>> State Use/Tot (15min) (MB) (MB) JobID(JobArrayID) User GRES/job ...
>>> g001 gpu mix 1 32 0.06* 95200 89990 gpu:gtx1080ti:2(S:0-1) 8692106 joesnow gpu=2
>>> g002 gpu mix 6 32 1.70* 95200 71692 gpu:gtx1080ti:2(S:0-1) 8692181(8536946_566) gailhail gpu=1 8692131(8536946_563) gailhail gpu=1
>>> g003 gpu mix 1 32 0.06* 95200 87622 gpu:gtx1080ti:2(S:0-1) 8692111 joesnow gpu=2
>>> g004 gpu mix 6 32 1.74* 95200 65647 gpu:gtx1080ti:2(S:0-1) 8692124(8536946_562) gailhail gpu=1 8692122(8536946_561) gailhail gpu=1
>>>
>>>
>>> It looks as if the column 'Partition' needs to be four spaces wider.
>>>
>>> Cheers,
>>>
>>> Loris
>>>
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: Ole.H.Nielsen at fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620
More information about the slurm-users
mailing list