Hello,
I just deployed the latest SLURM and I am getting some odd issues restating it. Anyone saw this before, how can I fix it?
Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Failed with result 'core-dump'. Jan 30 17:22:09 slurmctrl01 systemd[1]: Failed to start Slurm controller daemon. Jan 30 17:22:09 slurmctrl01 systemd[1]: systemd-coredump@8-92126-0.service: Succeeded. journalctl -xe #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92119: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92112: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92122: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92111: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92115: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92123: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92104: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6) -- Subject: Process 92054 (slurmctld) dumped core -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- Documentation: man:core(5) -- -- Process 92054 (slurmctld) crashed and dumped core. -- -- This usually indicates a programming error in the crashing program and -- should be reported to its vendor as a bug. Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Failed with result 'core-dump'. -- Subject: Unit failed -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- The unit slurmctld.service has entered the 'failed' state with result 'core-dump'. Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: Failed to start Slurm controller daemon. -- Subject: Unit slurmctld.service has failed -- Defined-By: systemd
Regards, Vitorio