[slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

Ryan Novosielski novosirj at rutgers.edu
Tue Dec 14 21:45:03 UTC 2021


Did a git bisect and answered my own question: “yes.”

[novosirj at amarel1 Slurm_tools]$ git bisect good
72cd05d78f1077142143f20c4293c8c367ffb5a7 is the first bad commit
commit 72cd05d78f1077142143f20c4293c8c367ffb5a7
Author: OleHolmNielsen <Ole.H.Nielsen at fysik.dtu.dk>
Date:   Fri Apr 23 15:11:37 2021 +0200

    Changes related to "squeue -O".  May not work with Slurm 19.05 and older.

:040000 040000 dee11077f72dd898dcadccf9d0dd2cfc438a8d1f 61880fe14a49a7a96167b89d21dede41f2751d86 M      pestat

> On Dec 14, 2021, at 4:29 PM, Ryan Novosielski <novosirj at rutgers.edu> wrote:
> 
> Hi Ole,
> 
> Thanks again for your great tools!
> 
> Is something expected to have broken this script for older versions of Slurm somehow? A version we have with a file time of 1/19/21 will show job IDs and users for a given node, but the version you released yesterday does not seem to (we may have missed versions in the middle, so it may not be this version that did it):
> 
> Older: 
> 
> [root at amarel1 pestat]# ./pestat -F -w slepner080
> Print only nodes that are flagged by * (RED nodes)
> Select only nodes in hostlist=slepner080
> Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
>                            State Use/Tot              (MB)     (MB)  JobId User ...
> slepner080           main*      mix  22  24    1.07*   128000   116325  17036194 mt1044 17032319 as2654 17039145 vs670  
> 
> Current:
> 
> [root at amarel1 pestat]# ~novosirj/bin/pestat -F -w slepner080
> Print only nodes that are flagged by * (RED nodes)
> Select only nodes in hostlist=slepner080
> Hostname         Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
>                              State Use/Tot  (15min)     (MB)     (MB)  JobID User ...
> slepner080           main*     mix   22  24    1.07*   128000   116325   
> 
> You can see Joblist and JobID User are not present.
> 
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,  	 |---------------------------*O*---------------------------
> ||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
>     `'
> 
>> On Dec 13, 2021, at 7:09 AM, Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> wrote:
>> 
>> Hi Slurm users,
>> 
>> I have updated the "pestat" tool for printing Slurm nodes status with 1 line per node including job info.  The download page is https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
>> (also listed in https://slurm.schedmd.com/download.html).
>> 
>> Improvements:
>> 
>> * The GRES/GPU output option "pestat -G" now prints the job gres/gpu information as obtained from squeue's tres-alloc output option, which should contain the most correct GRES/GPU information.
>> 
>> If you have a cluster with GPUs, could you try out the latest version and send me any feedback?
>> 
>> Thanks to René Sitt for helpful suggestions and testing.
>> 
>> The pestat tool can print a large variety of node and job information, and is generally useful for monitoring nodes and jobs on Slurm clusters.  For command options and examples please see the download page.  My own favorite usage is "pestat -F".
>> 
>> Thanks,
>> Ole
>> 
>> -- 
>> Ole Holm Nielsen
>> PhD, Senior HPC Officer
>> Department of Physics, Technical University of Denmark
>> 
> 



More information about the slurm-users mailing list