[slurm-users] 4 sockets but "
Diego Zuccato
diego.zuccato at unibo.it
Fri Jul 23 06:16:22 UTC 2021
Il 21/07/2021 20:27, Ole Holm Nielsen ha scritto:
Hi Ole.
>> What should I think?
> Did you distribute the new slurm.conf to all compute nodes after the
> change?
/etc/slurm/slurm.conf is a symlink to /home/conf/slurm.conf, and /home
is NFS-mounted on every node. No need to re-distribute it :)
> Did you do "scontrol reconfig" for the slurmd daemons to pick
> up the changes?
Given the type of changes, I opted for "systemd restart slurmctld" (and
restart slurmd on the worker nodes). cssh and bash-completion make it
quite fast :)
> This is standard procedure when making any changes to
> slurm.conf, read about "reconfigure" in the scontrol man-page.
Yup.
> The Configless Slurm (https://slurm.schedmd.com/configless_slurm.html)
> from 20.02 makes distribution of slurm.conf really simple.
Eager to see it in Debian :)
> For monitoring the state of compute nodes and their jobs, I recommend
> "pestat" from
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
> I use "pestat -F" many times every day to see if any jobs are misbehaving.I'll have a look. I'm also setting up Zabbix for more general monitoring
but I'm not really OK with it yet (for example I still can't understand
how I can exclude some metrics from a host that got 'em added by a
template... When I'll have enough time I'll find a way :) ). Maybe
pestat can be added to the Zabbix metrics...
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
More information about the slurm-users
mailing list