[slurm-users] Hints, Cheatsheets, etc

mercan ahmet.mercan at uhem.itu.edu.tr
Mon Jul 8 20:31:47 UTC 2019


There is a official page which gives a lot of link to third party 
solutions you can use:


According to me, the best slurm page for system administration is:


At this page, You can find a lot of links and information which you 
need. But, I think there is not a generally accepted or an official 
solution to monitoring your cluster. Probably, it is because the slurm 
is somehow a kind of hpc lego, instead of a prebuilt toy.


Ahmet M.

8.07.2019 22:33 tarihinde Edward Ned Harvey (slurm) yazdı:
> I am an experienced sysadmin, new to being a slurm admin, and I'm 
> encountering some difficulty:
> If you have a simple question such as "how many cpu's are currently 
> being used in the foobar partition," or "give me an overview of the 
> waiting jobs and what are the reasons they're waiting" I don't have 
> any good easy ways yet to answer these questions. I can get the total 
> number of cpu's in a partition via "scontrol show partition foobar" 
> and I can get how many cpus are being used on a particular node via 
> "scontrol show node somenode" and I can get a (not easily parsable) 
> list of nodes within a partition via "sinfo". So all the information 
> is available, but very difficult to access because it would require 
> some very nontrivial parsing.
> I see projects like this: https://github.com/fasrc/slurm_showq 
> <https://github.com/fasrc/slurm_showq> and 
> https://github.com/fasrc/scalc <https://github.com/fasrc/scalc> which 
> seem to be created exactly for this purpose. They're trying to make 
> information in slurm more easily accessible.
> So, is there a better way to manage a slurm cluster, are there better 
> tools, or better ways to use them? Any other suggestions for me from 
> experienced slurm admins? Like, a cheatsheet of common commands or 
> scripts like slurm_showq and scalc? Or is this just the normal state 
> of the world?

More information about the slurm-users mailing list