[slurm-users] 4 sockets but "

Loris Bennett loris.bennett at fu-berlin.de
Fri Jul 23 07:05:24 UTC 2021


Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:

> Hi Diego,
>
> On 7/23/21 8:16 AM, Diego Zuccato wrote:
>>> The Configless Slurm (https://slurm.schedmd.com/configless_slurm.html) from
>>> 20.02 makes distribution of slurm.conf really simple.
>> Eager to see it in Debian :)
>
> IMHO, there ought to be a community effort to provide up-to-date Slurm packages
> for Debian (and Ubuntu), just like a colleague did for the EPEL repository for
> RHEL and derivatives ;-)  We run CentOS and can trivially build new RPMs from
> the Slurm source tar-balls.
>
>>> For monitoring the state of compute nodes and their jobs, I recommend
>>> "pestat" from
>>> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
>>> I use "pestat -F" many times every day to see if any jobs are
>>> misbehaving.I'll have a look. I'm also setting up Zabbix for more general
>>> monitoring 
>> but I'm not really OK with it yet (for example I still can't understand how I
>> can exclude some metrics from a host that got 'em added by a template... When
>> I'll have enough time I'll find a way :) ). Maybe pestat can be added to the
>> Zabbix metrics...
>
> Did you check out what pestat can do (and maybe not do) for you?  If you have
> any suggestions for improving pestat, I'd be glad to see what I can do.

We use both Zabbix and pestat.  Zabbix gives us general information on
the state of the nodes and file systems, and we have added some Slurm
metrics, such as number of jobs pending, amount of memory pending,
number of GPUs pending, etc.  This has been quite handy, although I find
Zabbix a bit tricky to configure.  This maybe because (a) we are stuck
on Version 3.4 due to the PHP dependency with CentOS 7 and (b) I only do
stuff very irregularly with Zabbix and so always have to start somewhat
from scratch.

pestat on the other hand gives us more information about what individual
jobs on individual nodes are up to at a given point in time.  I don't
quite see how one could integrate pestat itself directly into Zabbix, as
it is more geared to producing a report, but maybe Ole has ideas :-)

Cheers,

Loris

-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list