Hello,
I just deployed the latest SLURM and I am getting some odd issues restating it. Anyone saw this before, how can I fix it?
Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Failed with result 'core-dump'. Jan 30 17:22:09 slurmctrl01 systemd[1]: Failed to start Slurm controller daemon. Jan 30 17:22:09 slurmctrl01 systemd[1]: systemd-coredump@8-92126-0.service: Succeeded. journalctl -xe #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92119: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92112: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92122: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92111: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92115: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92123: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6)
Stack trace of thread 92104: #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f89550c2cb5 _wait (libslurmfull.so) #2 0x00007f89550cc5cf _worker (libslurmfull.so) #3 0x00007f8954ced1ca start_thread (libpthread.so.0) #4 0x00007f8953fa28d3 __clone (libc.so.6) -- Subject: Process 92054 (slurmctld) dumped core -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- Documentation: man:core(5) -- -- Process 92054 (slurmctld) crashed and dumped core. -- -- This usually indicates a programming error in the crashing program and -- should be reported to its vendor as a bug. Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Failed with result 'core-dump'. -- Subject: Unit failed -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- The unit slurmctld.service has entered the 'failed' state with result 'core-dump'. Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: Failed to start Slurm controller daemon. -- Subject: Unit slurmctld.service has failed -- Defined-By: systemd
Regards, Vitorio
Hi Vitorio,
Which version of RockyLinux? Did you install Slurm 24.11.1 (the latest version)? Is this a clean installation or an upgrade of an older Slurm?
Maybe you can find some useful information in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
IHTH, Ole
On 1/31/25 02:27, Vitorio Cargnini via slurm-users wrote:
Hello,
I just deployed the latest SLURM and I am getting some odd issues restating it. Anyone saw this before, how can I fix it?
/Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT/
/Jan 30 17:22:09 slurmctrl01 systemd[1]: slurmctld.service: Failed with result 'core-dump'./
/Jan 30 17:22:09 slurmctrl01 systemd[1]: Failed to start Slurm controller daemon./
/Jan 30 17:22:09 slurmctrl01 systemd[1]: systemd- coredump@8-92126-0.service: Succeeded./
/journalctl -xe/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
//
/ Stack trace of thread 92119:/
/ #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
//
/ Stack trace of thread 92112:/
/ #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
//
/ Stack trace of thread 92122:/
/ #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
//
/ Stack trace of thread 92111:/
/ #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
//
/ Stack trace of thread 92115:/
/ #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
//
/ Stack trace of thread 92123:/
/ #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
//
/ Stack trace of thread 92104:/
/ #0 0x00007f8954cf347c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)/
/ #1 0x00007f89550c2cb5 _wait (libslurmfull.so)/
/ #2 0x00007f89550cc5cf _worker (libslurmfull.so)/
/ #3 0x00007f8954ced1ca start_thread (libpthread.so.0)/
/ #4 0x00007f8953fa28d3 __clone (libc.so.6)/
/-- Subject: Process 92054 (slurmctld) dumped core/
/-- Defined-By: systemd/
/-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel/
/-- Documentation: man:core(5)/
/-- /
/-- Process 92054 (slurmctld) crashed and dumped core./
/-- /
/-- This usually indicates a programming error in the crashing program and/
/-- should be reported to its vendor as a bug./
/Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT/
/Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: slurmctld.service: Failed with result 'core-dump'./
/-- Subject: Unit failed/
/-- Defined-By: systemd/
/-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel/
/-- /
-- The unit slurmctld.service has entered the 'failed' state with result 'core-dump'.
Jan 30 17:22:09 slurmctrl01.internal.samsung systemd[1]: Failed to start Slurm controller daemon.
-- Subject: Unit slurmctld.service has failed
-- Defined-By: systemd