We are pleased to announce the availability of Slurm versions 24.05.3
and 23.11.10.
Version 24.05.3 fixes a potential database problem when deleting a qos.
This bug only existed in 24.05.
Both versions have fixes for jobs potentially being stuck when using
cloud nodes when some nodes are powered down, a regression in 23.11.9
and 24.05.2 that caused sattach to crash, and some other minor issues.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
> * Changes in Slurm 24.05.3
> ==========================
> -- data_parser/v0.0.40 - Added field descriptions
> -- slurmrestd - Avoid creating new slurmdbd connection per request to
> '* /slurm/slurmctld/*/*' endpoints.
> -- Fix compilation issue with switch/hpe_slingshot plugin.
> -- Fix gres per task allocation with threads-per-core.
> -- data_parser/v0.0.41 - Added field descriptions
> -- slurmrestd - Change back generated OpenAPI schema for
> `DELETE /slurm/v0.0.40/jobs/` to RequestBody instead of using parameters
> for request. slurmrestd will continue accept endpoint requests via
> RequestBody or HTTP query.
> -- topology/tree - Fix issues with switch distance optimization.
> -- Fix potential segfault of secondary slurmctld when falling back to the
> primary when running with a JobComp plugin.
> -- Enable --json/--yaml=v0.0.39 options on client commands to dump data using
> data_parser/v0.0.39 instead or outputting nothing.
> -- switch/hpe_slingshot - Fix issue that could result in a 0 length state file.
> -- Fix unnecessary message protocol downgrade for unregistered nodes.
> -- Fix unnecessarily packing alias addrs when terminating jobs with a mix of
> non-cloud/dynamic nodes and powered down cloud/dynamic nodes.
> -- accounting_storage/mysql - Fix issue when deleting a qos that could remove
> too many commas from the qos and/or delta_qos fields of the assoc table.
> -- slurmctld - Fix memory leak when using RestrictedCoresPerGPU.
> -- Fix allowing access to reservations without MaxStartDelay set.
> -- Fix regression introduced in 24.05.0rc1 breaking srun --send-libs parsing.
> -- Fix slurmd vsize memory leak when using job submission/allocation commands
> that implicitly or explicitly use --get-user-env.
> -- slurmd - Fix node going into invalid state when using CPUSpecList and
> setting CPUs to the # of cores on a multithreaded node
> -- Fix reboot asap nodes being considered in backfill after a restart.
> -- Fix --clusters/-M queries for clusters outside of a federation when
> fed_display is configured.
> -- Fix scontrol allowing updating job with bad cpus-per-task value.
> -- sattach - Fix regression from 24.05.2 security fix leading to crash.
> -- mpi/pmix - Fix assertion when built under --enable-debug.
> * Changes in Slurm 23.11.10
> ===========================
> -- switch/hpe_slingshot - Fix issue that could result in a 0 length state file.
> -- Fix unnecessary message protocol downgrade for unregistered nodes.
> -- Fix unnecessarily packing alias addrs when terminating jobs with a mix of
> non-cloud/dynamic nodes and powered down cloud/dynamic nodes.
> -- Fix allowing access to reservations without MaxStartDelay set.
> -- Fix scontrol allowing updating job with bad cpus-per-task value.
> -- sattach - Fix regression from 23.11.9 security fix leading to crash.