[slurm-users] 4 sockets but "

Diego Zuccato diego.zuccato at unibo.it
Fri Jul 23 06:16:22 UTC 2021


Il 21/07/2021 20:27, Ole Holm Nielsen ha scritto:

Hi Ole.

>> What should I think?
> Did you distribute the new slurm.conf to all compute nodes after the 
> change?
/etc/slurm/slurm.conf is a symlink to /home/conf/slurm.conf, and /home 
is NFS-mounted on every node. No need to re-distribute it :)

>  Did you do "scontrol reconfig" for the slurmd daemons to pick 
> up the changes?
Given the type of changes, I opted for "systemd restart slurmctld" (and 
restart slurmd on the worker nodes). cssh and bash-completion make it 
quite fast :)

>  This is standard procedure when making any changes to 
> slurm.conf, read about "reconfigure" in the scontrol man-page.
Yup.

> The Configless Slurm (https://slurm.schedmd.com/configless_slurm.html) 
> from 20.02 makes distribution of slurm.conf really simple.
Eager to see it in Debian :)

> For monitoring the state of compute nodes and their jobs, I recommend 
> "pestat" from 
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
> I use "pestat -F" many times every day to see if any jobs are misbehaving.I'll have a look. I'm also setting up Zabbix for more general monitoring 
but I'm not really OK with it yet (for example I still can't understand 
how I can exclude some metrics from a host that got 'em added by a 
template... When I'll have enough time I'll find a way :) ). Maybe 
pestat can be added to the Zabbix metrics...

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



More information about the slurm-users mailing list