Hello
I am writing to report an issue with the Slurmctld process on our RHEL 9 (Rocky Linux) .
Twice in the past 5 days, the Slurmctld process has encountered an error that resulted in the service stopping. The error message displayed was "double free or corruption (out)". This error has caused significant disruption to our jobs, and we are concerned about its recurrence.
We have tried troubleshooting the issue, but we have not been able to identify the root cause of the problem. We would appreciate any assistance or guidance you can provide to help us resolve this issue.
Please let us know if you need any additional information or if there are any specific steps we should take to diagnose the problem further.
Thank you for your attention to this matter.
Best regards,
_________________________
Jul 09 22:12:01 admin slurmctld[711010]: double free or corruption (fasttop) Jul 09 22:12:01 admin systemd[1]: slurmctld.service: Main process exited, code=killed, status=6/ABRT Jul 09 22:12:01 admin systemd[1]: slurmctld.service: Failed with result 'signal'. Jul 09 22:12:01 admin systemd[1]: slurmctld.service: Consumed 11min 26.451s CPU time.
.....
Jul 14 10:15:01 admin slurmctld[1633720]: double free or corruption (out) Jul 14 10:15:02 admin systemd[1]: slurmctld.service: Main process exited, code=killed, status=6/ABRT Jul 14 10:15:02 admin systemd[1]: slurmctld.service: Failed with result 'signal'. Jul 14 10:15:02 admin systemd[1]: slurmctld.service: Consumed 7min 27.596s CPU time.
_________________________
slurmctld -V slurm 22.05.9
________________________
cat /etc/slurm/slurm.conf |grep -v '#'
ClusterName=xxx SlurmctldHost=admin SlurmctldParameters=enable_configless SlurmUser=slurm AuthType=auth/munge CryptoType=crypto/munge
SlurmctldPort=6817 StateSaveLocation=/var/spool/slurmctld SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmctldDebug=verbose DebugFlags=NO_CONF_HASH
SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmdLogFile=/var/log/slurm/slurmd.log SlurmdDebug=verbose
SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core,CR_LLN DefMemPerCPU=1024 MaxMemPerCPU=4096 GresTypes=gpu
ProctrackType=proctrack/cgroup JobAcctGatherType=jobacct_gather/cgroup JobAcctGatherFrequency=15 JobCompType=jobcomp/none
TaskPlugin=task/cgroup LaunchParameters=use_interactive_step
AccountingStorageType=accounting_storage/slurmdbd AccountingStorageHost=admin AccountingStoragePort=6819 AccountingStorageEnforce=associations AccountingStorageTRES=gres/gpu
MailProg=/usr/bin/mailx EnforcePartLimits=YES MaxArraySize=200000 MaxJobCount=500000 MpiDefault=none ReturnToService=2 SwitchType=switch/none TmpFS=/tmpslurm/ UsePAM=1
InactiveLimit=0 KillWait=30 MessageTimeout=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0
PriorityType=priority/multifactor PriorityFlags=FAIR_TREE,MAX_TRES PriorityDecayHalfLife=1-0 PriorityWeightFairshare=10000
NodeName=xxx NodeHostname=xxx CPUs=4 Sockets=4 RealMemory=3500 TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN NodeName=xxx NodeHostname=xxx CPUs=2 Sockets=2 RealMemory=1700 TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN NodeName=xxx NodeHostname=xxx CPUs=4 Sockets=4 RealMemory=1700 TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN NodeName=xxx NodeHostname=xxx CPUs=4 Sockets=4 RealMemory=3500 TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN
NodeName=r9nc-24-[1-12] NodeHostname=r9nc-24-[1-12] Sockets=2 CoresPerSocket=12 ThreadsPerCore=1 CPUs=24 RealMemory=180000 State=UNKNOWN NodeName=r9nc-48-[1-4] NodeHostname=r9nc-48-[1-4] Sockets=2 CoresPerSocket=24 ThreadsPerCore=1 CPUs=48 RealMemory=480000 State=UNKNOWN NodeName=r9ng-1080-[1-7] NodeHostname=r9ng-1080-[1-7] Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 CPUs=20 RealMemory=180000 State=UNKNOWN Gres=gpu:1080ti:4 NodeName=r9ng-1080-8 NodeHostname=r9ng-1080-8 Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 CPUs=20 RealMemory=176687 State=UNKNOWN Gres=gpu:1080ti:1
PartitionName=24CPUNodes Nodes=r9nc-24-[1-12] State=UP MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=7500 DefMemPerCPU=7500 TRESBillingWeights="CPU=1.0,Mem=0.125G" Default=YES PartitionName=48CPUNodes Nodes=r9nc-48-[1-4] State=UP MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=10000 DefMemPerCPU=8000 TRESBillingWeights="CPU=1.0,Mem=0.125G" PartitionName=GPUNodes Nodes=r9ng-1080-[1-7] State=UP MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=9000 DefMemPerCPU=9000 PartitionName=GPUNodes1080-dev Nodes=r9ng-1080-8 State=UP MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=9000 DefMemPerCPU=9000 Hidden=Yes
_________________________
sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST 24CPUNodes* up infinite 12 idle r9nc-24-[1-12] 48CPUNodes up infinite 2 idle r9nc-48-[1-2] GPUNodes up infinite 4 idle r9ng-1080-[4-7] GPUNodes1080-dev up infinite 1 idle r9ng-1080-8
On 7/15/24 10:43, William VINCENT via slurm-users wrote:
I am writing to report an issue with the Slurmctld process on our RHEL 9 (Rocky Linux) .
Twice in the past 5 days, the Slurmctld process has encountered an error that resulted in the service stopping. The error message displayed was "double free or corruption (out)". This error has caused significant disruption to our jobs, and we are concerned about its recurrence.
We have tried troubleshooting the issue, but we have not been able to identify the root cause of the problem. We would appreciate any assistance or guidance you can provide to help us resolve this issue.
Please let us know if you need any additional information or if there are any specific steps we should take to diagnose the problem further.
You're running Slurm 22.05.9 on RockyLinux 9 (is that Rocky 9.4 or what?). Such an old Slurm version probably hasn't been tested much on EL9 systems,
For security reasons you ought to upgrade to a recent Slurm version, just search for "CVE" in https://github.com/SchedMD/slurm/blob/master/NEWS to find out about security holes in older versions.
You can upgrade by 2 major releases in a single step, so you can go to 23.11.8. Upgrading Slurm is fairly easy, and I've collected various pieces of advice in the Wiki page https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slur...
Hopefully a newer Slurm version is going to solve your issue.
I hope this helps, Ole
Thank you for your response, I hadn't considered that version 22 could be the problem.
I am aware that we are not up to date, but we use the EPEL repo for our RPM packages. Originally, we did not want to install .rpm directly because our policy is to apply security updates every night via the repositories, but unfortunately, in this case, it does not work. I think it is because only one person is responsible for maintaining the packages for RHEL.
I have already reported the security issue, but at the moment it does not seem possible to update: https://bugzilla.redhat.com/show_bug.cgi?id=2280545
It appears from another ticket that the compilation fails for version 24: https://bugzilla.redhat.com/show_bug.cgi?id=2259935
If the compilation fails, will the RPM package work on RHEL 9?
On 7/15/24 11:35, William V via slurm-users wrote:
Thank you for your response, I hadn't considered that version 22 could be the problem.
I am aware that we are not up to date, but we use the EPEL repo for our RPM packages. Originally, we did not want to install .rpm directly because our policy is to apply security updates every night via the repositories, but unfortunately, in this case, it does not work. I think it is because only one person is responsible for maintaining the packages for RHEL.
You should *NOT* use Slurm packages from the EPEL repository!! The Slurm documentation recommends to exclude those packages, see https://slurm.schedmd.com/upgrades.html#epel_repository
I have already reported the security issue, but at the moment it does not seem possible to update: https://bugzilla.redhat.com/show_bug.cgi?id=2280545
RedHat doesn't provide support for Slurm, and if necessary you should contact SchedMD to obtain Slurm support.
It appears from another ticket that the compilation fails for version 24: https://bugzilla.redhat.com/show_bug.cgi?id=2259935
I think this ticket only reports problems regarding older Slurm releases?
If the compilation fails, will the RPM package work on RHEL 9?
You should build your own Slurm RPM packages, and compilation failure would indicate a bug somewhere!
Just as a test, I've now built RPM packages of the currently supported Slurm releases 23.11.8 and 24.05.1 on a RockyLinux 9.4 system. The RPMs have built without any issues or compilation errors at all! I haven't tested these RPMs on our production cluster which runs EL8 :-)
I recommend that you consult the Slurm documentation page[1] and my Wiki page for Slurm installation: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/ Remember to install all prerequisite packages before building Slurm, as explained in the Wiki!
Best regards, Ole
Wow, thank you so much for all this information and the installation wiki. I have a lot of work to do to change the infrastructure, I hope it will go smoothly.
How can I propose modifications to the wiki? For example, for RHEL9, it is missing 'dnf install dbus-devel' for compil with "cgroup v2" .
On 16-07-2024 16:20, William V via slurm-users wrote:
How can I propose modifications to the wiki? For example, for RHEL9, it is missing 'dnf install dbus-devel' for compil with "cgroup v2" .
On my RockyLinux 9.4 system there was no requirement for the dbus-devel RPM package (it isn't installed) when I built the Slurm RPMs. How did you experience this requirement?
/Ole
I can confirm on a freshly-installed RockyLinux 9.4 system, the dbus-devel package was not installed by default. The Development Tools
# dnf repoquery --groupmember dbus-devel Last metadata expiration check: 2:04:16 ago on Tue 16 Jul 2024 12:02:50 PM EDT. dbus-devel-1:1.12.20-8.el9.i686 dbus-devel-1:1.12.20-8.el9.x86_64 @platform-devel
# dnf group list Last metadata expiration check: 2:03:23 ago on Tue 16 Jul 2024 12:02:50 PM EDT. Available Environment Groups: Minimal Install Workstation Custom Operating System Virtualization Host Installed Environment Groups: Server with GUI Server Installed Groups: Legacy UNIX Compatibility Console Internet Tools Container Management Development Tools Headless Management RPM Development Tools System Tools Available Groups: .NET Development Graphical Administration Tools Network Servers Scientific Support Security Tools Smart Card Support
So the package was _not_ present on any of the groups that got installed, and "Platform Development" isn't in the group list in the first place.
On Jul 16, 2024, at 13:50, Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com wrote:
On 16-07-2024 16:20, William V via slurm-users wrote:
How can I propose modifications to the wiki? For example, for RHEL9, it is missing 'dnf install dbus-devel' for compil with "cgroup v2" .
On my RockyLinux 9.4 system there was no requirement for the dbus-devel RPM package (it isn't installed) when I built the Slurm RPMs. How did you experience this requirement?
/Ole
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
I had exactly this problem : https://www.reddit.com/r/SLURM/comments/152ef0c/problems_installing_slurm/
Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot find cgroup plugin for cgroup/v2 Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot create cgroup context for cgroup/v2 Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Unable to initialize cgroup plugin Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: slurmd initialization failed
So on the machine where I compile the packages, I installed dbus-devel, I recompiled and then reinstalled on the machine and it works now.
Another thing, to install devel packages on Rocky Linux (I don't know about other RHEL), you need to use the command: "dnf install xxx --enablerepo=devel". Ex: dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth --enablerepo=devel; dnf install http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel --enablerepo=devel;
On 7/17/24 08:43, William V via slurm-users wrote:
I had exactly this problem : https://www.reddit.com/r/SLURM/comments/152ef0c/problems_installing_slurm/
Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot find cgroup plugin for cgroup/v2 Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot create cgroup context for cgroup/v2 Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Unable to initialize cgroup plugin Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: slurmd initialization failed
So on the machine where I compile the packages, I installed dbus-devel, I recompiled and then reinstalled on the machine and it works now.
Thanks a lot for this observation! Now I see (thanks, Bas!) in https://slurm.schedmd.com/quickstart_admin.html#prereqs that:
cgroup Task Constraining: The task/cgroup plugin will be built if the hwloc development library is present. cgroup/v2 support also requires the bpf and dbus development libraries.
Therefore one *must* install the following packages for cgroup/v2 support:
$ dnf install hwloc-devel libbpf dbus-devel
I've now added these RPM prerequisites in the Wiki page https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#install-prereq...
Another thing, to install devel packages on Rocky Linux (I don't know about other RHEL), you need to use the command: "dnf install xxx --enablerepo=devel". Ex: dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth --enablerepo=devel; dnf install http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel --enablerepo=devel;
That's an interesting issue which we should explore to determine the best practices!
In my tests on a Rocky 9.4 system I didn't need to use "--enablerepo=devel", even though I see there exists a repo file /etc/yum.repos.d/rocky-devel.repo containing, however, a warning message:
name=Rocky Linux $releasever - Devel WARNING! FOR BUILDROOT ONLY DO NOT LEAVE ENABLED
When I try to install a devel RPM package, it comes from the appstream repo (defined in /etc/yum.repos.d/rocky.repo) in stead:
$ dnf install dbus-devel Last metadata expiration check: 2:09:03 ago on Wed 17 Jul 2024 07:40:07 AM CEST. Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Installing: dbus-devel x86_64 1:1.12.20-8.el9 appstream 33 k
On AlmaLinux 9.4 the appstream repo is defined in /etc/yum.repos.d/almalinux-appstream.repo so it's readily available.
Could you perhaps examine in more detail your system's appstream repo and why you need --enablerepo=devel ?
Thanks, Ole
Hello Log when i want install without devel dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel Last metadata expiration check: 3:49:07 ago on Wed 17 Jul 2024 12:27:52 PM CEST. Package gcc-11.4.1-3.el9.x86_64 is already installed. Package python3-3.9.18-3.el9_4.1.x86_64 is already installed. Package openssl-1:3.0.7-27.el9.x86_64 is already installed. Package munge-0.5.13-13.el9.x86_64 is already installed. Package munge-libs-0.5.13-13.el9.x86_64 is already installed. No match for argument: munge-devel No match for argument: lua-devel No match for argument: rrdtool-devel Package infiniband-diags-48.0-1.el9.x86_64 is already installed. Package libibumad-48.0-1.el9.x86_64 is already installed. No match for argument: perl-Switch Package xorg-x11-xauth-1:1.1-10.el9.x86_64 is already installed. No match for argument: http-parser-devel No match for argument: json-c-devel No match for argument: freeipmi-devel Package mariadb-server-3:10.5.22-1.el9_2.x86_64 is already installed. All matches were filtered out by modular filtering for argument: mariadb-devel Error: Unable to find a match: munge-devel lua-devel rrdtool-devel perl-Switch http-parser-devel json-c-devel freeipmi-devel mariadb-devel
after install : dnf info json-c-devel Last metadata expiration check: 2:53:58 ago on Wed 17 Jul 2024 01:23:45 PM CEST. Installed Packages Name : json-c-devel Version : 0.14 Release : 11.el9 Architecture : x86_64 Size : 128 k Source : json-c-0.14-11.el9.src.rpm Repository : @System From repo : devel Summary : Development files for json-c URL : https://github.com/json-c/json-c License : MIT Description : This package contains libraries and header files for : developing applications that use json-c.
default repo rocky : [appstream] name=Rocky Linux $releasever - AppStream mirrorlist=https://mirrors.rockylinux.org/mirrorlist?arch=$basearch&repo=AppStream-... #baseurl=http://dl.rockylinux.org/$contentdir/$releasever/AppStream/$basearch/os/ gpgcheck=1 enabled=1 countme=1 metadata_expire=6h gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9
Log when i want install with devel dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel --enablerepo=devel Last metadata expiration check: 1:00:58 ago on Wed 17 Jul 2024 03:15:50 PM CEST. Package gcc-11.4.1-3.el9.x86_64 is already installed. Package python3-3.9.18-3.el9_4.1.x86_64 is already installed. Package openssl-1:3.0.7-27.el9.x86_64 is already installed. Package munge-0.5.13-13.el9.x86_64 is already installed. Package munge-libs-0.5.13-13.el9.x86_64 is already installed. Package infiniband-diags-48.0-1.el9.x86_64 is already installed. Package libibumad-48.0-1.el9.x86_64 is already installed. Package xorg-x11-xauth-1:1.1-10.el9.x86_64 is already installed. Package mariadb-server-3:10.5.22-1.el9_2.x86_64 is already installed. Dependencies resolved. ================================================================================================================= Package Architecture Version Repository Size ================================================================================================================= Installing: freeipmi-devel x86_64 1.6.14-2.el9 devel 234 k gtk2-devel x86_64 2.24.33-8.el9 appstream 2.7 M http-parser-devel x86_64 2.9.4-6.el9 devel 14 k hwloc x86_64 2.4.1-5.el9 baseos 189 k hwloc-devel x86_64 2.4.1-5.el9 appstream 251 k json-c-devel x86_64 0.14-11.el9 devel 45 k libjwt-devel x86_64 1.12.1-11.el9 epel 15 k libssh2-devel x86_64 1.11.0-1.el9 epel 55 k lua x86_64 5.4.4-4.el9 appstream 187 k lua-devel x86_64 5.4.4-4.el9 devel 21 k man2html x86_64 1.6-29.g.el9 epel 30 k mariadb-devel x86_64 3:10.5.22-1.el9_2 devel 1.0 M munge-devel x86_64 0.5.13-13.el9 devel 23 k ncurses-devel x86_64 6.2-10.20210508.el9 appstream 516 k numactl x86_64 2.0.16-3.el9 baseos 67 k numactl-devel x86_64 2.0.16-3.el9 appstream 21 k openssl-devel x86_64 1:3.0.7-27.el9 appstream 3.0 M pam-devel x86_64 1.5.1-19.el9 appstream 140 k perl-ExtUtils-MakeMaker noarch 2:7.60-3.el9 appstream 289 k perl-Switch noarch 2.17-23.el9 devel 26 k readline-devel x86_64 8.1-4.el9 appstream 194 k rpm-build x86_64 4.16.1.3-29.el9 appstream 59 k rrdtool-devel x86_64 1.7.2-21.el9 devel 19 k Installing dependencies: annobin x86_64 12.31-2.el9 appstream 1.0 M apr x86_64 1.7.0-12.el9_3 appstream 122 k apr-util x86_64 1.6.1-23.el9 appstream 94 k apr-util-bdb x86_64 1.6.1-23.el9 appstream 12 k atk-devel x86_64 2.36.0-5.el9 appstream 173 k brotli x86_64 1.0.9-6.el9 appstream 312 k brotli-devel x86_64 1.0.9-6.el9 appstream 31 k bzip2 x86_64 1.0.8-8.el9 baseos 52 k bzip2-devel x86_64 1.0.8-8.el9 appstream 214 k cairo-devel x86_64 1.17.4-7.el9 appstream 190 k debugedit x86_64 5.0-5.el9 appstream 76 k dwz x86_64 0.14-3.el9 appstream 127 k ed x86_64 1.14.2-12.el9 baseos 74 k efi-srpm-macros noarch 6-2.el9_0 appstream 22 k elfutils x86_64 0.190-2.el9 baseos 543 k fontconfig-devel x86_64 2.14.0-2.el9_1 appstream 127 k fonts-srpm-macros noarch 1:2.0.5-7.el9.1 appstream 27 k freetype-devel x86_64 2.10.4-9.el9 appstream 1.1 M fribidi-devel x86_64 1.0.10-6.el9.2 appstream 25 k gcc-plugin-annobin x86_64 11.4.1-3.el9 appstream 46 k gdb-minimal x86_64 10.2-13.el9 appstream 3.5 M gdk-pixbuf2-devel x86_64 2.42.6-4.el9_4 appstream 63 k ghc-srpm-macros noarch 1.5.0-6.el9 appstream 7.8 k glib2-devel x86_64 2.68.4-14.el9 appstream 470 k go-srpm-macros noarch 3.2.0-3.el9 appstream 26 k graphite2-devel x86_64 1.3.14-9.el9 appstream 21 k harfbuzz-devel x86_64 2.7.4-10.el9 appstream 305 k http-parser x86_64 2.9.4-6.el9 appstream 37 k httpd x86_64 2.4.57-8.el9 appstream 45 k httpd-core x86_64 2.4.57-8.el9 appstream 1.4 M httpd-filesystem noarch 2.4.57-8.el9 appstream 12 k httpd-tools x86_64 2.4.57-8.el9 appstream 80 k info x86_64 6.7-15.el9 baseos 224 k kernel-srpm-macros noarch 1.0-13.el9 appstream 15 k libX11-devel x86_64 1.7.0-9.el9 appstream 939 k libXau-devel x86_64 1.0.9-8.el9 appstream 13 k libXcomposite-devel x86_64 0.4.5-7.el9 appstream 16 k libXcursor-devel x86_64 1.2.0-7.el9 appstream 22 k libXext-devel x86_64 1.3.4-8.el9 appstream 72 k libXfixes-devel x86_64 5.0.3-16.el9 appstream 12 k libXft-devel x86_64 2.3.3-8.el9 appstream 18 k libXi-devel x86_64 1.7.10-8.el9 appstream 99 k libXinerama-devel x86_64 1.1.4-10.el9 appstream 13 k libXrandr-devel x86_64 1.5.2-8.el9 appstream 19 k libXrender-devel x86_64 0.9.10-16.el9 appstream 16 k libblkid-devel x86_64 2.37.4-18.el9 appstream 17 k libdatrie-devel x86_64 0.2.13-4.el9 appstream 132 k libffi-devel x86_64 3.4.2-8.el9 appstream 28 k libicu-devel x86_64 67.1-9.el9 appstream 830 k libmount-devel x86_64 2.37.4-18.el9 appstream 18 k libpng-devel x86_64 2:1.6.37-12.el9 appstream 290 k libselinux-devel x86_64 3.6-1.el9 appstream 113 k libsepol-devel x86_64 3.6-1.el9 appstream 39 k libssh2 x86_64 1.11.0-1.el9 epel 132 k libthai-devel x86_64 0.1.28-8.el9 appstream 117 k libtiff-devel x86_64 4.4.0-12.el9 appstream 514 k libxcb-devel x86_64 1.13.1-9.el9 appstream 1.0 M libxml2-devel x86_64 2.9.13-6.el9_4 appstream 827 k lua-rpm-macros noarch 1-6.el9 appstream 9.0 k lua-srpm-macros noarch 1-6.el9 appstream 8.5 k mailcap noarch 2.1.49-5.el9 baseos 32 k man2html-core x86_64 1.6-29.g.el9 epel 58 k mariadb-connector-c-devel x86_64 3.2.6-1.el9_0 appstream 55 k ncurses-c++-libs x86_64 6.2-10.20210508.el9 appstream 36 k ocaml-srpm-macros noarch 6-6.el9 appstream 7.8 k openblas-srpm-macros noarch 2-11.el9 appstream 7.3 k pango-devel x86_64 1.48.7-3.el9 appstream 140 k patch x86_64 2.7.6-16.el9 appstream 127 k pcre-cpp x86_64 8.44-3.el9.3 appstream 26 k pcre-devel x86_64 8.44-3.el9.3 appstream 470 k pcre-utf16 x86_64 8.44-3.el9.3 appstream 184 k pcre-utf32 x86_64 8.44-3.el9.3 appstream 175 k pcre2-devel x86_64 10.40-5.el9 appstream 471 k pcre2-utf32 x86_64 10.40-5.el9 appstream 202 k perl-AutoSplit noarch 5.74-481.el9 appstream 20 k perl-Benchmark noarch 1.23-481.el9 appstream 25 k perl-CPAN-Meta-YAML noarch 0.018-461.el9 appstream 26 k perl-Devel-PPPort x86_64 3.62-4.el9 appstream 211 k perl-ExtUtils-Command noarch 2:7.60-3.el9 appstream 14 k perl-ExtUtils-Constant noarch 0.25-481.el9 appstream 45 k perl-ExtUtils-Install noarch 2.20-4.el9 appstream 44 k perl-ExtUtils-Manifest noarch 1:1.73-4.el9 appstream 34 k perl-ExtUtils-ParseXS noarch 1:3.40-460.el9 appstream 182 k perl-File-Compare noarch 1.100.600-481.el9 appstream 12 k perl-Filter x86_64 2:1.60-4.el9 appstream 81 k perl-I18N-Langinfo x86_64 0.19-481.el9 appstream 21 k perl-JSON-PP noarch 1:4.06-4.el9 appstream 65 k perl-Test-Harness noarch 1:3.42-461.el9 appstream 267 k perl-Text-Balanced noarch 2.04-4.el9 appstream 48 k perl-deprecate noarch 0.04-481.el9 appstream 13 k perl-locale noarch 1.09-481.el9 appstream 12 k perl-srpm-macros noarch 1-41.el9 appstream 8.2 k perl-version x86_64 7:0.99.28-4.el9 appstream 62 k pixman-devel x86_64 0.40.0-6.el9_3 appstream 16 k pyproject-srpm-macros noarch 1.12.0-1.el9 appstream 13 k python-rpm-macros noarch 3.9-53.el9 appstream 15 k python-srpm-macros noarch 3.9-53.el9 appstream 17 k python3-packaging noarch 20.9-5.el9 appstream 69 k python3-rpm-generators noarch 12-9.el9 appstream 27 k python3-rpm-macros noarch 3.9-53.el9 appstream 10 k qt5-srpm-macros noarch 5.15.9-1.el9 appstream 7.9 k rdma-core-devel x86_64 48.0-1.el9 appstream 373 k redhat-rpm-config noarch 207-1.el9 appstream 66 k rocky-logos-httpd noarch 90.15-2.el9 appstream 24 k rust-srpm-macros noarch 17-4.el9 appstream 9.3 k sysprof-capture-devel x86_64 3.40.1-3.el9 appstream 59 k systemtap-sdt-devel x86_64 5.0-4.el9 appstream 74 k xorg-x11-proto-devel noarch 2022.2-1.el9 appstream 263 k xz-devel x86_64 5.2.5-8.el9_0 appstream 52 k zip x86_64 3.0-35.el9 baseos 263 k zlib-devel x86_64 1.2.11-40.el9 appstream 44 k Installing weak dependencies: apr-util-openssl x86_64 1.6.1-23.el9 appstream 14 k mariadb-connector-c-doc noarch 3.2.6-1.el9_0 devel 98 k mod_http2 x86_64 2.0.26-2.el9_4 appstream 162 k mod_lua x86_64 2.4.57-8.el9 appstream 59 k perl-CPAN-Meta noarch 2.150010-460.el9 appstream 176 k perl-CPAN-Meta-Requirements noarch 2.140-461.el9 appstream 31 k perl-Encode-Locale noarch 1.05-21.el9 appstream 19 k perl-Time-HiRes x86_64 4:1.9764-462.el9 appstream 57 k perl-devel x86_64 4:5.32.1-481.el9 appstream 659 k perl-doc noarch 5.32.1-481.el9 appstream 4.5 M
Transaction Summary ================================================================================================================= Install 144 Packages
Hi William,
Maybe you need to enable the CodeReady Linux Builder (CRB) repository for AlmaLinux/RockyLinux 9? Look for CRB in https://wiki.rockylinux.org/rocky/repo/
The command for EL9 is: dnf config-manager --set-enabled crb
For EL8 enable this in stead: dnf config-manager --set-enabled powertools
I believe the packages you list below as supplied by "devel" will be installed automatically once you have enabled CRB on EL9 or PowerTools on EL8. Can you verify this?
I should add the above information to my Wiki page.
Best regards, Ole
On 7/17/24 16:21, William V via slurm-users wrote:
Hello Log when i want install without devel dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel Last metadata expiration check: 3:49:07 ago on Wed 17 Jul 2024 12:27:52 PM CEST. Package gcc-11.4.1-3.el9.x86_64 is already installed. Package python3-3.9.18-3.el9_4.1.x86_64 is already installed. Package openssl-1:3.0.7-27.el9.x86_64 is already installed. Package munge-0.5.13-13.el9.x86_64 is already installed. Package munge-libs-0.5.13-13.el9.x86_64 is already installed. No match for argument: munge-devel No match for argument: lua-devel No match for argument: rrdtool-devel Package infiniband-diags-48.0-1.el9.x86_64 is already installed. Package libibumad-48.0-1.el9.x86_64 is already installed. No match for argument: perl-Switch Package xorg-x11-xauth-1:1.1-10.el9.x86_64 is already installed. No match for argument: http-parser-devel No match for argument: json-c-devel No match for argument: freeipmi-devel Package mariadb-server-3:10.5.22-1.el9_2.x86_64 is already installed. All matches were filtered out by modular filtering for argument: mariadb-devel Error: Unable to find a match: munge-devel lua-devel rrdtool-devel perl-Switch http-parser-devel json-c-devel freeipmi-devel mariadb-devel
after install : dnf info json-c-devel Last metadata expiration check: 2:53:58 ago on Wed 17 Jul 2024 01:23:45 PM CEST. Installed Packages Name : json-c-devel Version : 0.14 Release : 11.el9 Architecture : x86_64 Size : 128 k Source : json-c-0.14-11.el9.src.rpm Repository : @System
From repo : devel
Summary : Development files for json-c URL : https://github.com/json-c/json-c License : MIT Description : This package contains libraries and header files for : developing applications that use json-c.
default repo rocky : [appstream] name=Rocky Linux $releasever - AppStream mirrorlist=https://mirrors.rockylinux.org/mirrorlist?arch=$basearch&repo=AppStream-... #baseurl=http://dl.rockylinux.org/$contentdir/$releasever/AppStream/$basearch/os/ gpgcheck=1 enabled=1 countme=1 metadata_expire=6h gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9
Log when i want install with devel dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel --enablerepo=devel Last metadata expiration check: 1:00:58 ago on Wed 17 Jul 2024 03:15:50 PM CEST. Package gcc-11.4.1-3.el9.x86_64 is already installed. Package python3-3.9.18-3.el9_4.1.x86_64 is already installed. Package openssl-1:3.0.7-27.el9.x86_64 is already installed. Package munge-0.5.13-13.el9.x86_64 is already installed. Package munge-libs-0.5.13-13.el9.x86_64 is already installed. Package infiniband-diags-48.0-1.el9.x86_64 is already installed. Package libibumad-48.0-1.el9.x86_64 is already installed. Package xorg-x11-xauth-1:1.1-10.el9.x86_64 is already installed. Package mariadb-server-3:10.5.22-1.el9_2.x86_64 is already installed. Dependencies resolved. ================================================================================================================= Package Architecture Version Repository Size ================================================================================================================= Installing: freeipmi-devel x86_64 1.6.14-2.el9 devel 234 k gtk2-devel x86_64 2.24.33-8.el9 appstream 2.7 M http-parser-devel x86_64 2.9.4-6.el9 devel 14 k hwloc x86_64 2.4.1-5.el9 baseos 189 k hwloc-devel x86_64 2.4.1-5.el9 appstream 251 k json-c-devel x86_64 0.14-11.el9 devel 45 k libjwt-devel x86_64 1.12.1-11.el9 epel 15 k libssh2-devel x86_64 1.11.0-1.el9 epel 55 k lua x86_64 5.4.4-4.el9 appstream 187 k lua-devel x86_64 5.4.4-4.el9 devel 21 k man2html x86_64 1.6-29.g.el9 epel 30 k mariadb-devel x86_64 3:10.5.22-1.el9_2 devel 1.0 M munge-devel x86_64 0.5.13-13.el9 devel 23 k ncurses-devel x86_64 6.2-10.20210508.el9 appstream 516 k numactl x86_64 2.0.16-3.el9 baseos 67 k numactl-devel x86_64 2.0.16-3.el9 appstream 21 k openssl-devel x86_64 1:3.0.7-27.el9 appstream 3.0 M pam-devel x86_64 1.5.1-19.el9 appstream 140 k perl-ExtUtils-MakeMaker noarch 2:7.60-3.el9 appstream 289 k perl-Switch noarch 2.17-23.el9 devel 26 k readline-devel x86_64 8.1-4.el9 appstream 194 k rpm-build x86_64 4.16.1.3-29.el9 appstream 59 k rrdtool-devel x86_64 1.7.2-21.el9 devel 19 k Installing dependencies: annobin x86_64 12.31-2.el9 appstream 1.0 M apr x86_64 1.7.0-12.el9_3 appstream 122 k apr-util x86_64 1.6.1-23.el9 appstream 94 k apr-util-bdb x86_64 1.6.1-23.el9 appstream 12 k atk-devel x86_64 2.36.0-5.el9 appstream 173 k brotli x86_64 1.0.9-6.el9 appstream 312 k brotli-devel x86_64 1.0.9-6.el9 appstream 31 k bzip2 x86_64 1.0.8-8.el9 baseos 52 k bzip2-devel x86_64 1.0.8-8.el9 appstream 214 k cairo-devel x86_64 1.17.4-7.el9 appstream 190 k debugedit x86_64 5.0-5.el9 appstream 76 k dwz x86_64 0.14-3.el9 appstream 127 k ed x86_64 1.14.2-12.el9 baseos 74 k efi-srpm-macros noarch 6-2.el9_0 appstream 22 k elfutils x86_64 0.190-2.el9 baseos 543 k fontconfig-devel x86_64 2.14.0-2.el9_1 appstream 127 k fonts-srpm-macros noarch 1:2.0.5-7.el9.1 appstream 27 k freetype-devel x86_64 2.10.4-9.el9 appstream 1.1 M fribidi-devel x86_64 1.0.10-6.el9.2 appstream 25 k gcc-plugin-annobin x86_64 11.4.1-3.el9 appstream 46 k gdb-minimal x86_64 10.2-13.el9 appstream 3.5 M gdk-pixbuf2-devel x86_64 2.42.6-4.el9_4 appstream 63 k ghc-srpm-macros noarch 1.5.0-6.el9 appstream 7.8 k glib2-devel x86_64 2.68.4-14.el9 appstream 470 k go-srpm-macros noarch 3.2.0-3.el9 appstream 26 k graphite2-devel x86_64 1.3.14-9.el9 appstream 21 k harfbuzz-devel x86_64 2.7.4-10.el9 appstream 305 k http-parser x86_64 2.9.4-6.el9 appstream 37 k httpd x86_64 2.4.57-8.el9 appstream 45 k httpd-core x86_64 2.4.57-8.el9 appstream 1.4 M httpd-filesystem noarch 2.4.57-8.el9 appstream 12 k httpd-tools x86_64 2.4.57-8.el9 appstream 80 k info x86_64 6.7-15.el9 baseos 224 k kernel-srpm-macros noarch 1.0-13.el9 appstream 15 k libX11-devel x86_64 1.7.0-9.el9 appstream 939 k libXau-devel x86_64 1.0.9-8.el9 appstream 13 k libXcomposite-devel x86_64 0.4.5-7.el9 appstream 16 k libXcursor-devel x86_64 1.2.0-7.el9 appstream 22 k libXext-devel x86_64 1.3.4-8.el9 appstream 72 k libXfixes-devel x86_64 5.0.3-16.el9 appstream 12 k libXft-devel x86_64 2.3.3-8.el9 appstream 18 k libXi-devel x86_64 1.7.10-8.el9 appstream 99 k libXinerama-devel x86_64 1.1.4-10.el9 appstream 13 k libXrandr-devel x86_64 1.5.2-8.el9 appstream 19 k libXrender-devel x86_64 0.9.10-16.el9 appstream 16 k libblkid-devel x86_64 2.37.4-18.el9 appstream 17 k libdatrie-devel x86_64 0.2.13-4.el9 appstream 132 k libffi-devel x86_64 3.4.2-8.el9 appstream 28 k libicu-devel x86_64 67.1-9.el9 appstream 830 k libmount-devel x86_64 2.37.4-18.el9 appstream 18 k libpng-devel x86_64 2:1.6.37-12.el9 appstream 290 k libselinux-devel x86_64 3.6-1.el9 appstream 113 k libsepol-devel x86_64 3.6-1.el9 appstream 39 k libssh2 x86_64 1.11.0-1.el9 epel 132 k libthai-devel x86_64 0.1.28-8.el9 appstream 117 k libtiff-devel x86_64 4.4.0-12.el9 appstream 514 k libxcb-devel x86_64 1.13.1-9.el9 appstream 1.0 M libxml2-devel x86_64 2.9.13-6.el9_4 appstream 827 k lua-rpm-macros noarch 1-6.el9 appstream 9.0 k lua-srpm-macros noarch 1-6.el9 appstream 8.5 k mailcap noarch 2.1.49-5.el9 baseos 32 k man2html-core x86_64 1.6-29.g.el9 epel 58 k mariadb-connector-c-devel x86_64 3.2.6-1.el9_0 appstream 55 k ncurses-c++-libs x86_64 6.2-10.20210508.el9 appstream 36 k ocaml-srpm-macros noarch 6-6.el9 appstream 7.8 k openblas-srpm-macros noarch 2-11.el9 appstream 7.3 k pango-devel x86_64 1.48.7-3.el9 appstream 140 k patch x86_64 2.7.6-16.el9 appstream 127 k pcre-cpp x86_64 8.44-3.el9.3 appstream 26 k pcre-devel x86_64 8.44-3.el9.3 appstream 470 k pcre-utf16 x86_64 8.44-3.el9.3 appstream 184 k pcre-utf32 x86_64 8.44-3.el9.3 appstream 175 k pcre2-devel x86_64 10.40-5.el9 appstream 471 k pcre2-utf32 x86_64 10.40-5.el9 appstream 202 k perl-AutoSplit noarch 5.74-481.el9 appstream 20 k perl-Benchmark noarch 1.23-481.el9 appstream 25 k perl-CPAN-Meta-YAML noarch 0.018-461.el9 appstream 26 k perl-Devel-PPPort x86_64 3.62-4.el9 appstream 211 k perl-ExtUtils-Command noarch 2:7.60-3.el9 appstream 14 k perl-ExtUtils-Constant noarch 0.25-481.el9 appstream 45 k perl-ExtUtils-Install noarch 2.20-4.el9 appstream 44 k perl-ExtUtils-Manifest noarch 1:1.73-4.el9 appstream 34 k perl-ExtUtils-ParseXS noarch 1:3.40-460.el9 appstream 182 k perl-File-Compare noarch 1.100.600-481.el9 appstream 12 k perl-Filter x86_64 2:1.60-4.el9 appstream 81 k perl-I18N-Langinfo x86_64 0.19-481.el9 appstream 21 k perl-JSON-PP noarch 1:4.06-4.el9 appstream 65 k perl-Test-Harness noarch 1:3.42-461.el9 appstream 267 k perl-Text-Balanced noarch 2.04-4.el9 appstream 48 k perl-deprecate noarch 0.04-481.el9 appstream 13 k perl-locale noarch 1.09-481.el9 appstream 12 k perl-srpm-macros noarch 1-41.el9 appstream 8.2 k perl-version x86_64 7:0.99.28-4.el9 appstream 62 k pixman-devel x86_64 0.40.0-6.el9_3 appstream 16 k pyproject-srpm-macros noarch 1.12.0-1.el9 appstream 13 k python-rpm-macros noarch 3.9-53.el9 appstream 15 k python-srpm-macros noarch 3.9-53.el9 appstream 17 k python3-packaging noarch 20.9-5.el9 appstream 69 k python3-rpm-generators noarch 12-9.el9 appstream 27 k python3-rpm-macros noarch 3.9-53.el9 appstream 10 k qt5-srpm-macros noarch 5.15.9-1.el9 appstream 7.9 k rdma-core-devel x86_64 48.0-1.el9 appstream 373 k redhat-rpm-config noarch 207-1.el9 appstream 66 k rocky-logos-httpd noarch 90.15-2.el9 appstream 24 k rust-srpm-macros noarch 17-4.el9 appstream 9.3 k sysprof-capture-devel x86_64 3.40.1-3.el9 appstream 59 k systemtap-sdt-devel x86_64 5.0-4.el9 appstream 74 k xorg-x11-proto-devel noarch 2022.2-1.el9 appstream 263 k xz-devel x86_64 5.2.5-8.el9_0 appstream 52 k zip x86_64 3.0-35.el9 baseos 263 k zlib-devel x86_64 1.2.11-40.el9 appstream 44 k Installing weak dependencies: apr-util-openssl x86_64 1.6.1-23.el9 appstream 14 k mariadb-connector-c-doc noarch 3.2.6-1.el9_0 devel 98 k mod_http2 x86_64 2.0.26-2.el9_4 appstream 162 k mod_lua x86_64 2.4.57-8.el9 appstream 59 k perl-CPAN-Meta noarch 2.150010-460.el9 appstream 176 k perl-CPAN-Meta-Requirements noarch 2.140-461.el9 appstream 31 k perl-Encode-Locale noarch 1.05-21.el9 appstream 19 k perl-Time-HiRes x86_64 4:1.9764-462.el9 appstream 57 k perl-devel x86_64 4:5.32.1-481.el9 appstream 659 k perl-doc noarch 5.32.1-481.el9 appstream 4.5 M
Transaction Summary
Install 144 Packages
On 18-07-2024 08:15, William V via slurm-users wrote:
yes ! that work with crb repo
Thanks for the test, I'm glad the package installations works as expected now!
I've corrected the repository documentation in the Wiki page at https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#install-prereq...
Best regards, Ole
Hello,
Thanks again for your documentation, I deployed 24.05.2 last week. But this weekend slurmctld crashed with only the following in the logs:
"Aug 25 15:33:02 slurmadmin slurmctld[79950]: free(): invalid next size (fast)"
Also, I regularly get these messages in my logs even though these two machines are in the same subnet in VMs, and the slurmadmin machine is the same machine that runs slurmctld and slurmd, so it cannot lose itself. Meanwhile, all my compute nodes are never disconnected. /var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:slurmadmin RPC:REQUEST_PING : Communication connection failure /var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:vmjupyter RPC:REQUEST_PING : Communication connection failure /var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:vmdev RPC:REQUEST_PING : Communication connection failure
Should I open a new topic for this?
Thank you in advance.
Hello, This weekend I had the same error, even though I am in 24.x now
Oct 04 19:41:02 systemd[1]: slurmctld.service: Failed with result 'core-dump'. Oct 04 19:41:02 systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT Oct 04 19:41:02 slurmctld[1981869]: double free or corruption (fasttop)