[slurm-users] 4 sockets but "
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Jul 23 07:28:32 UTC 2021
On 7/23/21 9:05 AM, Loris Bennett wrote:
> We use both Zabbix and pestat. Zabbix gives us general information on
> the state of the nodes and file systems, and we have added some Slurm
> metrics, such as number of jobs pending, amount of memory pending,
> number of GPUs pending, etc. This has been quite handy, although I find
> Zabbix a bit tricky to configure. This maybe because (a) we are stuck
> on Version 3.4 due to the PHP dependency with CentOS 7 and (b) I only do
> stuff very irregularly with Zabbix and so always have to start somewhat
> from scratch.
I prefer simple tools, if possible :-) For monitoring Slurm compute
nodes, I'm fully satisfied with the LBNL Node Health Check tools. This
offers checks of disk space, memory, GPUs, Infiniband and much more. See
For monitoring the Slurm queue and pending jobs, I use the "showuserjobs"
> pestat on the other hand gives us more information about what individual
> jobs on individual nodes are up to at a given point in time. I don't
> quite see how one could integrate pestat itself directly into Zabbix, as
> it is more geared to producing a report, but maybe Ole has ideas :-)
Sorry, no ideas because I'm not familiar with Zabbix.
More information about the slurm-users