[slurm-users] Hints, Cheatsheets, etc
ahmet.mercan at uhem.itu.edu.tr
Mon Jul 8 20:31:47 UTC 2019
There is a official page which gives a lot of link to third party
solutions you can use:
According to me, the best slurm page for system administration is:
At this page, You can find a lot of links and information which you
need. But, I think there is not a generally accepted or an official
solution to monitoring your cluster. Probably, it is because the slurm
is somehow a kind of hpc lego, instead of a prebuilt toy.
8.07.2019 22:33 tarihinde Edward Ned Harvey (slurm) yazdı:
> I am an experienced sysadmin, new to being a slurm admin, and I'm
> encountering some difficulty:
> If you have a simple question such as "how many cpu's are currently
> being used in the foobar partition," or "give me an overview of the
> waiting jobs and what are the reasons they're waiting" I don't have
> any good easy ways yet to answer these questions. I can get the total
> number of cpu's in a partition via "scontrol show partition foobar"
> and I can get how many cpus are being used on a particular node via
> "scontrol show node somenode" and I can get a (not easily parsable)
> list of nodes within a partition via "sinfo". So all the information
> is available, but very difficult to access because it would require
> some very nontrivial parsing.
> I see projects like this: https://github.com/fasrc/slurm_showq
> <https://github.com/fasrc/slurm_showq> and
> https://github.com/fasrc/scalc <https://github.com/fasrc/scalc> which
> seem to be created exactly for this purpose. They're trying to make
> information in slurm more easily accessible.
> So, is there a better way to manage a slurm cluster, are there better
> tools, or better ways to use them? Any other suggestions for me from
> experienced slurm admins? Like, a cheatsheet of common commands or
> scripts like slurm_showq and scalc? Or is this just the normal state
> of the world?
More information about the slurm-users