[slurm-users] "Low RealMem" after upgrade
Diego Zuccato
diego.zuccato at unibo.it
Tue Oct 5 08:06:34 UTC 2021
Il 05/10/2021 09:22, Ole Holm Nielsen ha scritto:
> What is a "frontend"? Do you mean the slurmctld server?
Yes, sorry. "Frontend" is how we call the node(s) used by users to
submit jobs, where slurmctld and slurmdbd run. We'll probably move
slurmdbd and slurmctld to a dedicated VM in a future upgrade (mainly, I
have to be sure it doesn't need IB or access to the gluster fs that's
only available over IB).
Does sbatch give slurmctld just a path to the job script or the whole
script?
>> worked with IDLE (RESUME gives "Invalid node state specified").
> So "scontrol update node=... state=idle" gives the node a correct idle
> state, whereas "state=resume" doesn't? Did you restart the slurmd on
> the compute nodes?
Yes. Complete node reboots, actually. Multiple times. When desperate,
try rebooting.
>> SLURM 20.11.4.
> You wrote that you use Slurm 21.08 from Debian 11. How did 20.11 get
> into the picture?
Good question. I copy-pasted 21.08 from a node after the upgrade, but
now all nodes say 20.11.4 . Really confused :-? Just to add to the
confusion, packages.debian.org gives 20.11.7+really20.11.4-2 as
slurmctld version for bullseye. No mention of 21.08 anywhere, not even
in sid (20.11.8). ARGH! Did I dream it? And if so, how could I c&p it????
> The slurmdbd and slurmctld servers must have versions
> >= that of slurmd, see some links in
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
Yup. That's why I upgraded the whole cluster at once.
Tks for the help.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
More information about the slurm-users
mailing list