Slurm major releases are moving to a six month release cycle. This
change starts with the upcoming Slurm 24.05 release this May. Slurm
24.11 will follow in November 2024. Major releases then continue every
May and November in 2025 and beyond.
There are two main goals of this change:
- Faster delivery of newer features and functionality for customers.
- "Predictable" release timing, especially for those sites that would
prefer to upgrade during an annual system maintenance window.
SchedMD will be adjusting our handling of backwards-compatibility within
Slurm itself, and how SchedMD's support services will handle older releases.
For the 24.05 release, Slurm will still only support upgrading from (and
mixed-version operations with) the prior two releases (23.11, 23.02).
Starting with 24.11, Slurm will start supporting upgrades from the prior
three releases (24.05, 23.11, 23.02).
SchedMD's Slurm Support has been built around an 18-month cycle. This
18-month cycle has traditionally covered the current stable release,
plus one prior major releases. With the increase in release frequency
this support window will now cover to the current stable release, plus
two prior major releases.
The blog post version of this announcement includes a table that
outlines the updated support lifecycle:
https://www.schedmd.com/slurm-releases-move-to-a-six-month-cycle/
- Tim
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
We are pleased to announce the availability of Slurm version 23.11.5.
The 23.11.5 release includes some important fixes related to newer
features as well as some database fixes. The most noteworthy fixes
include fixing the sattach command (which only worked for root and
SlurmUser after 23.11.0) and fixing an issue while constructing the new
lineage database entries. This last change will also perform a query
during the upgrade from any prior 23.11 version to fix existing databases.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
-Tim
> * Changes in Slurm 23.11.5
> ==========================
> -- Fix Debian package build on systems that are not able to query the systemd
> package.
> -- data_parser/v0.0.40 - Emit a warning instead of an error if a disabled
> parser is invoked.
> -- slurmrestd - Improve handling when content plugins rely on parsers
> that haven't been loaded.
> -- Fix old pending jobs dying (Slurm version 21.08.x and older) when upgrading
> Slurm due to "Invalid message version" errors.
> -- Have client commands sleep for progressively longer periods when backed off
> by the RPC rate limiting system.
> -- slurmctld - Ensure agent queue is flushed correctly at shutdown time.
> -- slurmdbd - correct lineage construction during assoc table conversion for
> partition based associations.
> -- Add new RPCs and API call for faster querying of job states from slurmctld.
> -- slurmrestd - Add endpoint '/slurm/{data_parser}/jobs/state'.
> -- squeue - Add `--only-job-state` argument to use faster query of job states.
> -- Make a job requesting --no-requeue, or JobRequeue=0 in the slurm.conf,
> supersede RequeueExit[Hold].
> -- Add sackd man page to the Debian package.
> -- Fix issues with tasks when a job was shrinked more than once.
> -- Fix reservation update validation that resulted in reject of correct
> updates of reservation when the reservation was running jobs.
> -- Fix possible segfault when the backup slurmctld is asserting control.
> -- Fix regression introduced in 23.02.4 where slurmctld was not properly
> tracking the total GRES selected for exclusive multi-node jobs, potentially
> and incorrectly bypassing limits.
> -- Fix tracking of jobs typeless GRES count when multiple typed GRES with the
> same name are also present in the job allocation. Otherwise, the job could
> bypass limits configured for the typeless GRES.
> -- Fix tracking of jobs typeless GRES count when request specification has a
> typeless GRES name first and then typed GRES of different names (i.e.
> --gres=gpu:1,tmpfs:foo:2,tmpfs:bar:7). Otherwise, the job could bypass
> limits configured for the generic of the typed one (tmpfs in the example).
> -- Fix batch step not having SLURM_CLUSTER_NAME filled in.
> -- slurmstepd - Avoid error during `--container` job cleanup about
> RunTimeQuery never being configured. Results in cleanup where job steps not
> fully started.
> -- Fix nodes not being rebooted when using salloc/sbatch/srun "--reboot" flag.
> -- Send scrun.lua in configless mode.
> -- Fix rejecting an interactive job whose extra constraint request cannot
> immediately be satisfied.
> -- Fix regression in 23.11.0 when parsing LogTimeFormat=iso8601_ms that
> prevented milliseconds from being printed.
> -- Fix issue where you could have a gpu allocated as well as a shard on that
> gpu allocated at the same time.
> -- Fix slurmctld crashes when using extra constraints with job arrays.
> -- sackd/slurmrestd/scrun - Avoid memory leak on new unix socket connection.
> -- The failed node field is filled when a node fails but does not time out.
> -- slurmrestd - Remove requiring job script field and job component script
> fields to both be populated in the `POST /slurm/v0.0.40/job/submit`
> endpoint as there can only be one batch step script for a job.
> -- slurmrestd - When job script is provided in '.jobs[].script' and '.script'
> fields, the '.script' field's value will be used in the
> `POST /slurm/v0.0.40/job/submit` endpoint.
> -- slurmrestd - Reject HetJob submission missing or empty batch script for
> first Het component in the `POST /slurm/v0.0.40/job/submit` endpoint.
> -- slurmrestd - Reject job when empty batch script submitted to the
> POST /slurm/v0.0.40/job/submit` endpoint.
> -- Fix pam_slurm and pam_slurm_adopt when using auth/slurm.
> -- slurmrestd - Add 'cores_per_socket' field to
> `POST /slurm/v0.0.40/job/submit` endpoint.
> -- Fix srun and other Slurm commands running within a "configless" salloc when
> salloc itself fetched the config.
> -- Enforce binding with shared gres selection if requested.
> -- Fix job allocation failures when the requested tres type or name ends in
> "gres" or "license".
> -- accounting_storage/mysql - Fix lineage string construction when adding a
> user association with a partition.
> -- Fix sattach command.
> -- Fix ReconfigFlags. Due how reconfig was changed in 23.11, they will also
> be used to influence the slurmctld startup as well.
> -- Fix starting slurmd in configless mode if MUNGE support was disabled.
--
Tim McMullan
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support