[slurm-users] Hints, Cheatsheets, etc

mercan ahmet.mercan at uhem.itu.edu.tr
Mon Jul 8 20:31:47 UTC 2019


Hi;

There is a official page which gives a lot of link to third party 
solutions you can use:

https://slurm.schedmd.com/download.html

According to me, the best slurm page for system administration is:


https://wiki.fysik.dtu.dk/niflheim/SLURM

At this page, You can find a lot of links and information which you 
need. But, I think there is not a generally accepted or an official 
solution to monitoring your cluster. Probably, it is because the slurm 
is somehow a kind of hpc lego, instead of a prebuilt toy.

Regards,

Ahmet M.


8.07.2019 22:33 tarihinde Edward Ned Harvey (slurm) yazdı:
>
> I am an experienced sysadmin, new to being a slurm admin, and I'm 
> encountering some difficulty:
>
> If you have a simple question such as "how many cpu's are currently 
> being used in the foobar partition," or "give me an overview of the 
> waiting jobs and what are the reasons they're waiting" I don't have 
> any good easy ways yet to answer these questions. I can get the total 
> number of cpu's in a partition via "scontrol show partition foobar" 
> and I can get how many cpus are being used on a particular node via 
> "scontrol show node somenode" and I can get a (not easily parsable) 
> list of nodes within a partition via "sinfo". So all the information 
> is available, but very difficult to access because it would require 
> some very nontrivial parsing.
>
> I see projects like this: https://github.com/fasrc/slurm_showq 
> <https://github.com/fasrc/slurm_showq> and 
> https://github.com/fasrc/scalc <https://github.com/fasrc/scalc> which 
> seem to be created exactly for this purpose. They're trying to make 
> information in slurm more easily accessible.
>
> So, is there a better way to manage a slurm cluster, are there better 
> tools, or better ways to use them? Any other suggestions for me from 
> experienced slurm admins? Like, a cheatsheet of common commands or 
> scripts like slurm_showq and scalc? Or is this just the normal state 
> of the world?
>



More information about the slurm-users mailing list