[slurm-users] Slurm version 22.05.3 is now available

Thu Aug 11 20:59:01 UTC 2022

We are pleased to announce the availability of Slurm version 22.05.3.

This release includes a number of low to moderate severity fixes made 
since the last maintenance release was made in June.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 22.05.3
> ==========================
>  -- job_container/tmpfs - cleanup containers even when the .ns file isn't
>     mounted anymore.
>  -- Ignore the bf_licenses option if using sched/builtin.
>  -- Do not clear the job's requested QOS (qos_id) when ineligible due to QOS.
>  -- Emit error and add fail-safe when job's qos_id changes unexpectedly.
>  -- Fix timeout value in log.
>  -- openapi/v0.0.38 - fix setting of DefaultTime when dumping a partition.
>  -- openapi/dbv0.0.38 - correct parsing association QOS field.
>  -- Fix LaunchParameters=mpir_use_nodeaddr.
>  -- Fix various edge cases where accrue limits could be exceeded or cause
>     underflow error messages.
>  -- Fix issue where a job requesting --ntasks and --nodes could be wrongly
>     rejected when spanning heterogeneous nodes.
>  -- openapi/v0.0.38 - detect when partition PreemptMode is disabled
>  -- openapi/v0.0.38 - add QOS flag to handle partition PreemptMode=within
>  -- Add total_cpus and total_nodes values to the partition list in
>     the job_submit/lua plugin.
>  -- openapi/dbv0.0.38 - reject and error on invalid flag values in well defined
>     flag fields.
>  -- openapi/dbv0.0.38 - correct QOS preempt_mode flag requests being silently
>     ignored.
>  -- accounting_storage/mysql - allow QOS preempt_mode flag updates when GANG
>     mode is requested.
>  -- openapi/dbv0.0.38 - correct QOS flag modifications request being silently
>     ignored.
>  -- sacct/sinfo/squeue - use openapi/[db]v0.0.38 for --json and --yaml modes.
>  -- Improve error messages when using configless and fetching the config fails.
>  -- Fix segfault when reboot_from_controller is configured and scontrol reboot
>     is used.
>  -- Fix regression which prevented a cons_tres gpu job to be submitted to a
>     cons_tres cluster from a non-con_tres cluster.
>  -- openapi/dbv0.0.38 - correct association QOS list parsing for updates.
>  -- Fix rollup incorrectly divying up unused reservation time between
>     associations.
>  -- slurmrestd - add SLURMRESTD_SECURITY=disable_unshare_files environment
>     variable.
>  -- Update rsmi detection to handle new default library location.
>  -- Fix header inclusion from slurmstepd manager code leading to multiple
>     definition errors when linking --without-shared-libslurm.
>  -- slurm.spec - explicitly disable Link Time Optimization (LTO) to avoid
>     linking errors on systems where LTO-related RPM macros are enabled by
>     default and the binutils version has a bug.
>  -- Fix issue in the api/step_io message writing logic leading to incorrect
>     behavior in API consuming clients like srun or sattach, including a segfault
>     when freeing IO buffers holding traffic from the tasks to the client.
>  -- openapi/dbv0.0.38 - avoid job queries getting rejected when cluster is not
>     provided by client.
>  -- openapi/dbv0.0.38 - accept job state filter as verbose names instead of
>     only numeric state ids.
>  -- Fix regression in 22.05.0rc1: if slurmd shuts down while a prolog is
>     running, the job is cancelled and the node is drained.
>  -- Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
>     and epilog scripts to complete or timeout. Previously, slurmd waited 120
>     seconds before timing out and killing prolog and epilog scripts.
>  -- GPU - Fix checking frequencies to check them all and not skip the last one.
>  -- GPU - Fix logic to set frequencies properly when handling multiple GPUs.
>  -- cgroup/v2 - Fix typo in error message.
>  -- cgroup/v2 - More robust pattern search for events.
>  -- Fix slurm_spank_job_[prolog|epilog] failures being masked if a Prolog or
>     Epilog script is defined (regression in 22.05.0rc1).
>  -- When a job requested nodes and can't immediately start, only report to
>     the user (squeue/scontrol et al) if nodes are down in the requested list.
>  -- openapi/dbv0.0.38 - Fix qos list/preempt not being parsed correctly.
>  -- Fix dynamic nodes registrations mapping previously assigned nodes.
>  -- Remove unnecessarily limit on count of 'shared' gres.
>  -- Fix shared gres on CLOUD nodes not properly initializing.