Slurm versions 24.11.2 and 24.05.6 are now available
We are pleased to announce the availability of Slurm versions 24.11.2 and 24.05.6. 24.11.2 fixes a variety of minor to major bugs. Fixed regressions include loading non-default QOS on pending jobs from pre-24.11 state, pending jobs displaying QOS=(null) when not explicitly requesting a QOS, running jobs that requested multiple partitions potentially having an incorrect partition when slurmctld is restarted, and burst_buffer.lua failing if slurm.conf is in a non-standard location. This release also fixes a few crashes in slurmctld: crashing when a job that can preempt requests --test-only, crasing when the scheduler evaluates a job on nodes with suspended jobs, and crashing due to a long-standing bug causing a job record without job_resrcs. 24.05.6 fixes sattach with auth/slurm, a slurmrestd crash when using data_parser/v0.0.40, a slurmctld crash when using job suspension, a performance regression for RPCs with large amounts of data, and some other moderate severity bugs. Downloads are available at https://www.schedmd.com/downloads.php . -- Marshall Garey Release Management, Support, and Development SchedMD LLC - Commercial Slurm Development and Support
* Changes in Slurm 24.11.2 ========================== -- Fix segfault when submitting --test-only jobs that can preempt. -- Fix regression introduced in 23.11 that prevented the following flags from being added to a reservation on an update: DAILY, HOURLY, WEEKLY, WEEKDAY, and WEEKEND. -- Fix crash and issues evaluating job's suitability for running in nodes with already suspended job(s) there. -- Slurmctld will ensure that healthy nodes are not reported as UnavailableNodes in job reason codes. -- Fix handling of jobs submitted to a current reservation with flags OVERLAP,FLEX or OVERLAP,ANY_NODES when it overlaps nodes with a future maintenance reservation. When a job submission had a time limit that overlapped with the future maintenance reservation, it was rejected. Now the job is accepted but stays pending with the reason "ReqNodeNotAvail, Reserved for maintenance". -- pam_slurm_adopt - avoid errors when explicitly setting some arguments to the default value. -- Fix qos preemption with PreemptMode=SUSPEND -- slurmdbd - When changing a user's name update lineage at the same time. -- Fix regression in 24.11 in which burst_buffer.lua does not inherit the SLURM_CONF environment variable from slurmctld and fails to run if slurm.conf is in a non-standard location. -- Fix memory leak in slurmctld if select/linear and the PreemptParameters=reclaim_licenses options are both set in slurm.conf. Regression in 24.11.1. -- Fix running jobs, that requested multiple partitions, from potentially being set to the wrong partition on restart. -- switch/hpe_slingshot - Fix compatibility with newer cxi drivers, specifically when specifying disable_rdzv_get. -- Add ABORT_ON_FATAL environment variable to capture a backtrace from any fatal() message. -- Fix printing invalid address in rate limiting log statement. -- sched/backfill - Fix node state PLANNED not being cleared from fully allocated nodes during a backfill cycle. -- select/cons_tres - Fix future planning of jobs with bf_licenses. -- Prevent redundant "on_data returned rc: Rate limit exceeded, please retry momentarily" error message from being printed in slurmctld logs. -- Fix loading non-default QOS on pending jobs from pre-24.11 state. -- Fix pending jobs displaying QOS=(null) when not explicitly requesting a QOS. -- Fix segfault issue from job record with no job_resrcs -- Fix failing sacctmgr delete/modify/show account operations with where clauses. -- Fix regression in 24.11 in which Slurm daemons started catching several SIGTSTP, SIGTTIN and SIGUSR1 signals and ignored them, while before they were not ignoring them. This also caused slurmctld to not being able to shutdown after a SIGTSTP because slurmscriptd caught the signal and stopped while slurmctld ignored it. Unify and fix these situations and get back to the previous behavior for these signals. -- Document that SIGQUIT is no longer ignored by slurmctld, slurmdbd, and slurmd in 24.11. As of 24.11.0rc1, SIGQUIT is identical to SIGINT and SIGTERM for these daemons, but this change was not documented. -- Fix not considering nodes marked for reboot without ASAP in the scheduler. -- Remove the boot^ state on unexpected node reboot after return to service. -- Do not allow new jobs to start on a node which is being rebooted with the flag nextstate=resume. -- Prevent lower priority job running after cancelling an ASAP reboot. -- Fix srun jobs starting on nextstate=resume rebooting nodes.
* Changes in Slurm 24.05.6 ========================== -- data_parser/v0.0.40 - Prevent a segfault in the slurmrestd when dumping data with v0.0.40+complex data parser. -- Fix sattach when using auth/slurm. -- scrun - Add support '--all' argument for kill subcommand. -- Fix performance regression while packing larger RPCs. -- Fix crash and issues evaluating job's suitability for running in nodes with already suspended job(s) there. -- Fixed a job requeuing issue that merged job entries into the same SLUID when all nodes in a job failed simultaneously. -- switch/hpe_slingshot - Fix compatibility with newer cxi drivers, specifically when specifying disable_rdzv_get. -- Add ABORT_ON_FATAL environment variable to capture a backtrace from any fatal() message.
On Tuesday, 25 February 2025 22:10:02 CET Marshall Garey via slurm-users wrote:
We are pleased to announce the availability of Slurm versions 24.11.2 and 24.05.6.
On the download page the wrong md5sum is displayed for slurm-24.11.2.tar.bz2 regards Markus Köberl -- Markus Koeberl Graz University of Technology Signal Processing and Speech Communication Laboratory E-mail: markus.koeberl@tugraz.at
Thank you Markus, I fixed the error and figured out how that happened so it shouldn't happen that way again! Thanks again, --Tim -- Tim McMullan Release Management, Support, and Development SchedMD LLC - Commercial Slurm Development and Support On Wed, Feb 26, 2025 at 4:13 AM Markus Köberl via slurm-users < slurm-users@lists.schedmd.com> wrote:
On Tuesday, 25 February 2025 22:10:02 CET Marshall Garey via slurm-users wrote:
We are pleased to announce the availability of Slurm versions 24.11.2 and 24.05.6.
On the download page the wrong md5sum is displayed for slurm-24.11.2.tar.bz2
regards Markus Köberl -- Markus Koeberl Graz University of Technology Signal Processing and Speech Communication Laboratory E-mail: markus.koeberl@tugraz.at -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
I am trying to add slurmdbd to my first attempt of slurmctld. I have mariadb 10.11 running and permissions set. MariaDB [(none)]> CREATE DATABASE slurm_acct_db; Query OK, 1 row affected (0.000 sec) MariaDB [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | slurm_acct_db | +--------------------+ Following the setup at, https://slurm.schedmd.com/accounting.html#mysql-configuration When I try to start slurmdbd it fails. [root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; disabled; preset: disabled) Active: inactive (dead) [root@vuwunicoslurmd3 ~]# systemctl enable --now slurmdbd Created symlink /etc/systemd/system/multi-user.target.wants/slurmdbd.service ? /usr/lib/systemd/system/slurmdbd.service. [root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; preset: disabled) Active: inactive (dead) Condition: start condition failed at Tue 2025-03-04 00:54:38 UTC; 1s ago ?? ConditionPathExists=/etc/slurm/slurmdbd.conf was not met Mar 04 00:54:38 vuwunicoslurmd3.ods.vuw.ac.nz systemd[1]: Slurm DBD accounting daemon was skipped because of an unmet co> [root@vuwunicoslurmd3 ~]# So there seems to be a hole in the guide. Some config is needed? regards Steven
Hello, yes, you need to configure the SlurmDBD daemon: https://slurm.schedmd.com/slurmdbd.html https://slurm.schedmd.com/slurmdbd.conf.html Accounting setup (enforcing limits for example) requires the database, but some additional steps are also required to get the whole system working. Systemd service is not starting because the configuration file is missing: ConditionPathExists=/etc/slurm/slurmdbd.conf Kind regards, -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E853B6E676ED061316B69B]
On Tue, Mar 04, 2025 at 01:03:00AM +0000, Steven Jones via slurm-users wrote: I am trying to add slurmdbd to my first attempt of slurmctld.
I have mariadb 10.11 running and permissions set.
MariaDB [(none)]> CREATE DATABASE slurm_acct_db; Query OK, 1 row affected (0.000 sec)
MariaDB [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | slurm_acct_db | +--------------------+
Following the setup at, https://slurm.schedmd.com/accounting.html#mysql-configuration
When I try to start slurmdbd it fails.
[root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; disabled; preset: disabled) Active: inactive (dead) [root@vuwunicoslurmd3 ~]# systemctl enable --now slurmdbd Created symlink /etc/systemd/system/multi-user.target.wants/slurmdbd.service ? /usr/lib/systemd/system/slurmdbd.service. [root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; preset: disabled) Active: inactive (dead) Condition: start condition failed at Tue 2025-03-04 00:54:38 UTC; 1s ago ?? ConditionPathExists=/etc/slurm/slurmdbd.conf was not met
Mar 04 00:54:38 vuwunicoslurmd3.ods.vuw.ac.nz systemd[1]: Slurm DBD accounting daemon was skipped because of an unmet co> [root@vuwunicoslurmd3 ~]#
So there seems to be a hole in the guide. Some config is needed?
regards
Steven
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On Tue, 2025-03-04 at 01:03:00 +0000, Slurm users wrote:
I am trying to add slurmdbd to my first attempt of slurmctld.
I have mariadb 10.11 running and permissions set.
MariaDB [(none)]> CREATE DATABASE slurm_acct_db; Query OK, 1 row affected (0.000 sec)
MariaDB [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | slurm_acct_db | +--------------------+
Following the setup at, https://slurm.schedmd.com/accounting.html#mysql-configuration
When I try to start slurmdbd it fails.
[root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; disabled; preset: disabled) Active: inactive (dead) [root@vuwunicoslurmd3 ~]# systemctl enable --now slurmdbd Created symlink /etc/systemd/system/multi-user.target.wants/slurmdbd.service ? /usr/lib/systemd/system/slurmdbd.service. [root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; preset: disabled) Active: inactive (dead) Condition: start condition failed at Tue 2025-03-04 00:54:38 UTC; 1s ago ?? ConditionPathExists=/etc/slurm/slurmdbd.conf was not met
TIL about the "--now" option to "systemctl enable"... thanks for this one! ;) although I admit to prefer a step-by-step approach (and I'd only enable a unit if it's been successfully started once, to avoid complaints at reboot)... You wrote that you configured MySQL but didn't mention SlurmDBD config. Does the file that is being complained about exist (on that machine)?
So there seems to be a hole in the guide. Some config is needed?
To be honest, I've been following Ole's detailed setup instructions since Adam and Eve - not the ones directly from the horse's mouth. Whatever, I'd first try to track down that ConditionPathExists issue... Best, Steffen -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany ~~~ Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de ~~~
On 3/4/25 09:43, Steffen Grunewald via slurm-users wrote:
Following the setup at, https://slurm.schedmd.com/accounting.html#mysql-configuration
When I try to start slurmdbd it fails.
[root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; disabled; preset: disabled) Active: inactive (dead) [root@vuwunicoslurmd3 ~]# systemctl enable --now slurmdbd Created symlink /etc/systemd/system/multi-user.target.wants/slurmdbd.service ? /usr/lib/systemd/system/slurmdbd.service. [root@vuwunicoslurmd3 ~]# systemctl status slurmdbd ? slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; preset: disabled) Active: inactive (dead) Condition: start condition failed at Tue 2025-03-04 00:54:38 UTC; 1s ago ?? ConditionPathExists=/etc/slurm/slurmdbd.conf was not met
TIL about the "--now" option to "systemctl enable"... thanks for this one! ;) although I admit to prefer a step-by-step approach (and I'd only enable a unit if it's been successfully started once, to avoid complaints at reboot)...
You wrote that you configured MySQL but didn't mention SlurmDBD config. Does the file that is being complained about exist (on that machine)?
So there seems to be a hole in the guide. Some config is needed?
To be honest, I've been following Ole's detailed setup instructions since Adam and Eve - not the ones directly from the horse's mouth. Whatever, I'd first try to track down that ConditionPathExists issue...
The Systemd error message "ConditionPathExists=/etc/slurm/slurmdbd.conf was not met" is a critical error! Check that the file exists and is owned by the user slurm and group slurm, for example: $ ls -l /etc/slurm/slurmdbd.conf -rw-------. 1 slurm slurm 504 Feb 28 2023 /etc/slurm/slurmdbd.conf Make sure that you configured slurmdbd.conf correctly, see this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#slurmdbd-configura... IHTH, Ole
participants (7)
-
Kamil Wilczek -
Markus Köberl -
Marshall Garey -
Ole Holm Nielsen -
Steffen Grunewald -
Steven Jones -
Tim McMullan