[slurm-users] Hints, Cheatsheets, etc
Edward Ned Harvey (slurm)
slurm at nedharvey.com
Mon Jul 8 19:33:15 UTC 2019
I am an experienced sysadmin, new to being a slurm admin, and I'm encountering some difficulty:
If you have a simple question such as "how many cpu's are currently being used in the foobar partition," or "give me an overview of the waiting jobs and what are the reasons they're waiting" I don't have any good easy ways yet to answer these questions. I can get the total number of cpu's in a partition via "scontrol show partition foobar" and I can get how many cpus are being used on a particular node via "scontrol show node somenode" and I can get a (not easily parsable) list of nodes within a partition via "sinfo". So all the information is available, but very difficult to access because it would require some very nontrivial parsing.
I see projects like this: https://github.com/fasrc/slurm_showq and https://github.com/fasrc/scalc which seem to be created exactly for this purpose. They're trying to make information in slurm more easily accessible.
So, is there a better way to manage a slurm cluster, are there better tools, or better ways to use them? Any other suggestions for me from experienced slurm admins? Like, a cheatsheet of common commands or scripts like slurm_showq and scalc? Or is this just the normal state of the world?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users