[slurm-users] "Low RealMem" after upgrade

Diego Zuccato diego.zuccato at unibo.it
Tue Oct 5 08:06:34 UTC 2021


Il 05/10/2021 09:22, Ole Holm Nielsen ha scritto:

> What is a "frontend"?  Do you mean the slurmctld server?
Yes, sorry. "Frontend" is how we call the node(s) used by users to 
submit jobs, where slurmctld and slurmdbd run. We'll probably move 
slurmdbd and slurmctld to a dedicated VM in a future upgrade (mainly, I 
have to be sure it doesn't need IB or access to the gluster fs that's 
only available over IB).
Does sbatch give slurmctld just a path to the job script or the whole 
script?

>> worked with IDLE (RESUME gives "Invalid node state specified").
> So "scontrol update node=... state=idle" gives the node a correct idle 
> state, whereas "state=resume" doesn't?  Did you restart the slurmd on 
> the compute nodes?
Yes. Complete node reboots, actually. Multiple times. When desperate, 
try rebooting.

>> SLURM 20.11.4.
> You wrote that you use Slurm 21.08 from Debian 11.  How did 20.11 get 
> into the picture?
Good question. I copy-pasted 21.08 from a node after the upgrade, but 
now all nodes say 20.11.4 . Really confused :-? Just to add to the 
confusion, packages.debian.org gives 20.11.7+really20.11.4-2 as 
slurmctld version for bullseye. No mention of 21.08 anywhere, not even 
in sid (20.11.8). ARGH! Did I dream it? And if so, how could I c&p it????

>  The slurmdbd and slurmctld servers must have versions 
>  >= that of slurmd, see some links in
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
Yup. That's why I upgraded the whole cluster at once.

Tks for the help.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



More information about the slurm-users mailing list