[slurm-users] Slurm version 22.05.3 is now available
Tim Wickberg
tim at schedmd.com
Thu Aug 11 20:59:01 UTC 2022
We are pleased to announce the availability of Slurm version 22.05.3.
This release includes a number of low to moderate severity fixes made
since the last maintenance release was made in June.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
- Tim
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
> * Changes in Slurm 22.05.3
> ==========================
> -- job_container/tmpfs - cleanup containers even when the .ns file isn't
> mounted anymore.
> -- Ignore the bf_licenses option if using sched/builtin.
> -- Do not clear the job's requested QOS (qos_id) when ineligible due to QOS.
> -- Emit error and add fail-safe when job's qos_id changes unexpectedly.
> -- Fix timeout value in log.
> -- openapi/v0.0.38 - fix setting of DefaultTime when dumping a partition.
> -- openapi/dbv0.0.38 - correct parsing association QOS field.
> -- Fix LaunchParameters=mpir_use_nodeaddr.
> -- Fix various edge cases where accrue limits could be exceeded or cause
> underflow error messages.
> -- Fix issue where a job requesting --ntasks and --nodes could be wrongly
> rejected when spanning heterogeneous nodes.
> -- openapi/v0.0.38 - detect when partition PreemptMode is disabled
> -- openapi/v0.0.38 - add QOS flag to handle partition PreemptMode=within
> -- Add total_cpus and total_nodes values to the partition list in
> the job_submit/lua plugin.
> -- openapi/dbv0.0.38 - reject and error on invalid flag values in well defined
> flag fields.
> -- openapi/dbv0.0.38 - correct QOS preempt_mode flag requests being silently
> ignored.
> -- accounting_storage/mysql - allow QOS preempt_mode flag updates when GANG
> mode is requested.
> -- openapi/dbv0.0.38 - correct QOS flag modifications request being silently
> ignored.
> -- sacct/sinfo/squeue - use openapi/[db]v0.0.38 for --json and --yaml modes.
> -- Improve error messages when using configless and fetching the config fails.
> -- Fix segfault when reboot_from_controller is configured and scontrol reboot
> is used.
> -- Fix regression which prevented a cons_tres gpu job to be submitted to a
> cons_tres cluster from a non-con_tres cluster.
> -- openapi/dbv0.0.38 - correct association QOS list parsing for updates.
> -- Fix rollup incorrectly divying up unused reservation time between
> associations.
> -- slurmrestd - add SLURMRESTD_SECURITY=disable_unshare_files environment
> variable.
> -- Update rsmi detection to handle new default library location.
> -- Fix header inclusion from slurmstepd manager code leading to multiple
> definition errors when linking --without-shared-libslurm.
> -- slurm.spec - explicitly disable Link Time Optimization (LTO) to avoid
> linking errors on systems where LTO-related RPM macros are enabled by
> default and the binutils version has a bug.
> -- Fix issue in the api/step_io message writing logic leading to incorrect
> behavior in API consuming clients like srun or sattach, including a segfault
> when freeing IO buffers holding traffic from the tasks to the client.
> -- openapi/dbv0.0.38 - avoid job queries getting rejected when cluster is not
> provided by client.
> -- openapi/dbv0.0.38 - accept job state filter as verbose names instead of
> only numeric state ids.
> -- Fix regression in 22.05.0rc1: if slurmd shuts down while a prolog is
> running, the job is cancelled and the node is drained.
> -- Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
> and epilog scripts to complete or timeout. Previously, slurmd waited 120
> seconds before timing out and killing prolog and epilog scripts.
> -- GPU - Fix checking frequencies to check them all and not skip the last one.
> -- GPU - Fix logic to set frequencies properly when handling multiple GPUs.
> -- cgroup/v2 - Fix typo in error message.
> -- cgroup/v2 - More robust pattern search for events.
> -- Fix slurm_spank_job_[prolog|epilog] failures being masked if a Prolog or
> Epilog script is defined (regression in 22.05.0rc1).
> -- When a job requested nodes and can't immediately start, only report to
> the user (squeue/scontrol et al) if nodes are down in the requested list.
> -- openapi/dbv0.0.38 - Fix qos list/preempt not being parsed correctly.
> -- Fix dynamic nodes registrations mapping previously assigned nodes.
> -- Remove unnecessarily limit on count of 'shared' gres.
> -- Fix shared gres on CLOUD nodes not properly initializing.
More information about the slurm-users
mailing list