Hello,
I just deployed the latest SLURM and I am getting some odd issues restating it. Anyone saw this before, how can I fix it?
Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT
Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Failed with result 'core-dump'.
Jan 30 17:22:09 slurmctrl01 systemd[1]: Failed to start Slurm controller daemon.
Jan 30 17:22:09 slurmctrl01 systemd[1]: systemd-coredump@8-92126-0.service: Succeeded.
journalctl -xe
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92119:
#0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92112:
#0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92122:
#0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92111:
#0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92115:
#0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92123:
#0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92104:
#0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f89550c2cb5 _wait (libslurmfull.so)
#2 0x00007f89550cc5cf _worker (libslurmfull.so)
#3 0x00007f8954ced1ca start_thread (libpthread.so.0)
#4 0x00007f8953fa28d3 __clone (libc.so.6)
-- Subject: Process 92054 (slurmctld) dumped core
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
--
-- Process 92054 (slurmctld) crashed and dumped core.
--
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT
Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Failed with result 'core-dump'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit slurmctld.service has entered the 'failed' state with result 'core-dump'.
Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: Failed to start Slurm controller daemon.
-- Subject: Unit slurmctld.service has failed
-- Defined-By: systemd
Regards,
Vitorio