[slurm-users] 4 sockets but "

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Jul 23 07:28:32 UTC 2021


Hi Loris,

On 7/23/21 9:05 AM, Loris Bennett wrote:
> We use both Zabbix and pestat.  Zabbix gives us general information on
> the state of the nodes and file systems, and we have added some Slurm
> metrics, such as number of jobs pending, amount of memory pending,
> number of GPUs pending, etc.  This has been quite handy, although I find
> Zabbix a bit tricky to configure.  This maybe because (a) we are stuck
> on Version 3.4 due to the PHP dependency with CentOS 7 and (b) I only do
> stuff very irregularly with Zabbix and so always have to start somewhat
> from scratch.

I prefer simple tools, if possible :-)  For monitoring Slurm compute 
nodes, I'm fully satisfied with the LBNL Node Health Check tools.  This 
offers checks of disk space, memory, GPUs, Infiniband and much more.  See
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-health-check

For monitoring the Slurm queue and pending jobs, I use the "showuserjobs" 
script from 
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserjobs

> pestat on the other hand gives us more information about what individual
> jobs on individual nodes are up to at a given point in time.  I don't
> quite see how one could integrate pestat itself directly into Zabbix, as
> it is more geared to producing a report, but maybe Ole has ideas :-)

Sorry, no ideas because I'm not familiar with Zabbix.

/Ole



More information about the slurm-users mailing list