[slurm-users] Hints, Cheatsheets, etc

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Jul 9 06:16:57 UTC 2019

Hi Edward,

Besides my Slurm Wiki page https://wiki.fysik.dtu.dk/niflheim/SLURM, I 
have written a number of tools which we use for monitoring our cluster, 
see https://github.com/OleHolmNielsen/Slurm_tools.  I recommend in 
particular these tools:

* pestat Prints a Slurm cluster nodes status with 1 line per node and 
job info.

* showuserjobs Print the current node status and batch jobs status 
broken down into userids.

Use the option "-p <partition>" to display partition data.

I recommend also this nice tool for displaying partition statistics:

* spart A user-oriented partition info command for slurm. 


On 7/8/19 9:33 PM, Edward Ned Harvey (slurm) wrote:
> I am an experienced sysadmin, new to being a slurm admin, and I'm 
> encountering some difficulty:
> If you have a simple question such as "how many cpu's are currently 
> being used in the foobar partition," or "give me an overview of the 
> waiting jobs and what are the reasons they're waiting" I don't have any 
> good easy ways yet to answer these questions. I can get the total number 
> of cpu's in a partition via "scontrol show partition foobar" and I can 
> get how many cpus are being used on a particular node via "scontrol show 
> node somenode" and I can get a (not easily parsable) list of nodes 
> within a partition via "sinfo". So all the information is available, but 
> very difficult to access because it would require some very nontrivial 
> parsing.
> I see projects like this: https://github.com/fasrc/slurm_showq and 
> https://github.com/fasrc/scalc which seem to be created exactly for this 
> purpose. They're trying to make information in slurm more easily accessible.
> So, is there a better way to manage a slurm cluster, are there better 
> tools, or better ways to use them? Any other suggestions for me from 
> experienced slurm admins? Like, a cheatsheet of common commands or 
> scripts like slurm_showq and scalc? Or is this just the normal state of 
> the world?

More information about the slurm-users mailing list