[slurm-users] Cluster not booting after upgrade to debian jessie

Tue Jan 9 08:10:31 MST 2018

Ciao Elisabetta,

On Tue, Jan 09, 2018 at 01:40:19PM +0100, Elisabetta Falivene wrote:
> The new kernel was installed during an upgrade from Debian 7 Wheezy to
> Debian 8 Jessie. The upgrade went ok on the 8 nodes of the cluster, but not
> on the master. Btw, on the nodes kernel 3.16 is working ok.

You may have some special storage on the front-end that is not
recognized by the new kernel. I think you'll get a better help on a
Debian related mailing list like debian-user [1]

> Stupid question: It's worth trying to make the new kernel work, in your
> opinion? If, in the worst case, I have to keep the 3.2 kernel on the master
> is so bad?

You need 3.16 with Jessie.

> > On 9 January 2018 at 13:16, Elisabetta Falivene <e.falivene at ilabroma.com>
> > wrote:
> >> First time after reboot launching sinfo:
> >>
> >> *sinfo: error: If munged is up, restart with —numthreads=10*
> >>
> >> *sinfo: error: Munge encode failed: Failed to access
> >> /var/run/munge/munge.socket2”: No such file or directory*

please check your munge installation:

Is munge installed?
dpkg -l munge

if not, install munge apt-get install munge

Is the munge key in place?

ls -la /etc/munge/munge.key

if it is not use create-munge-key and copy the key on all the nodes.

Is the munge daemon enabled?

systemctl is-enabled munge

if not use systemctl enable munge to enable and start it.

Is the munge daemon started?

systemctl status munge

If not, try to start it with systemctl start munge and check the error
message.

[1] https://lists.debian.org/debian-user/

Saluti
-- 
Gennaro Oliva