We are pleased to announce the availability of Slurm version 24.05.5.
This release fixes a few potential crashes, several stepmgr bugs,
compatibility for sstat and sattach with newer version steps, and some
other minor bugs.
Downloads are available at https://www.schedmd.com/downloads.php .
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
> * Changes in Slurm 24.05.5
> ==========================
> -- Fix issue signaling cron jobs resulting in unintended requeues.
> -- Fix slurmctld memory leak in implementation of HealthCheckNodeState=CYCLE.
> -- job_container/tmpfs - Fix SLURM_CONF env variable not being properly set.
> -- sched/backfill - Fix job's time_limit being overwritten by time_min for job
> arrays in some situations.
> -- RoutePart - fix segfault from incorrect memory allocation when node doesn't
> exist in any partition.
> -- slurmctld - Fix crash when a job is evaluated for a reservation after
> removal of a dynamic node.
> -- gpu/nvml - Attempt loading libnvidia-ml.so.1 as a fallback for failure in
> loading libnvidia-ml.so.
> -- slurmrestd - Fix populating non-required object fields of objects as '{}' in
> JSON/YAML instead of 'null' causing compiled OpenAPI clients to reject
> the response to 'GET /slurm/v0.0.40/jobs' due to validation failure of
> '.jobs[].job_resources'.
> -- Fix sstat/sattach protocol errors for steps on higher version slurmd's
> (regressions since 20.11.0rc1 and 16.05.1rc1 respectively).
> -- slurmd - Avoid a crash when starting slurmd version 24.05 with
> SlurmdSpoolDir files that have been upgraded to a newer major version of
> Slurm. Log warnings instead.
> -- Fix race condition in stepmgr step completion handling.
> -- Fix slurmctld segfault with stepmgr and MpiParams when running a job array.
> -- Fix requeued jobs keeping their priority until the decay thread happens.
> -- slurmctld - Fix crash and possible split brain issue if the
> backup controller handles an scontrol reconfigure while in control
> before the primary resumes operation.
> -- Fix stepmgr not getting dynamic node addrs from the controller
> -- stepmgr - avoid "Unexpected missing socket" errors.
> -- Fix `scontrol show steps` with dynamic stepmgr
> -- Support IPv6 in configless mode