[slurm-announce] Slurm version 22.05.4 is now available

Tim Wickberg tim at schedmd.com
Thu Sep 29 20:58:58 UTC 2022


We are pleased to announce the availability of Slurm version 22.05.4.

This includes fixes to two potential crashes in the backfill scheduler, 
alongside a number of other moderate severity issues.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 22.05.4
> ==========================
>  -- Fix return code from salloc when the job is revoked prior to executing user
>     command.
>  -- Fix minor memory leak when dealing with gres with multiple files.
>  -- Fix printing for no_consume gres in scontrol show job.
>  -- sinfo - Fix truncation of very large values when outputting memory.
>  -- Fix multi-node step launch failure when nodes in the controller aren't in
>     natural order. This can happen with inconsistent node naming (such as
>     node15 and node052) or with dynamic nodes which can register in any order.
>  -- job_container/tmpfs - Prevent reading the plugin config multiple times per
>     step.
>  -- Fix wrong attempt of gres binding for gres w/out cores defined.
>  -- Fix build to work with '--without-shared-libslurm' configure flag.
>  -- Fix power_save mode when repeatedly configuring too fast.
>  -- Fix sacct -I option.
>  -- Prevent jobs from being scheduled on future nodes.
>  -- Fix memory leak in slurmd happening on reconfigure when CPUSpecList used.
>  -- Fix sacctmgr show event [min|max]cpus.
>  -- Fix regression in 22.05.0rc1 where a prolog or epilog that redirected stdout
>     to a file could get erroneously killed, resulting in job launch failure
>     (for the prolog) and the node being drained.
>  -- cgroup/v1 - Make a static variable to remove potential redundant checking
>     for if the system has swap or not.
>  -- cgroup/v1 - Add check for swap when running OOM check after task
>     termination.
>  -- job_submit/lua - add --prefer support
>  -- cgroup/v1 - fix issue where sibling steps could incorrectly be accounted as
>     OOM when step memory limit was the same as the job allocation. Detect OOM
>     events via memory.oom_control oom_kill when exposed by the kernel instead of
>     subscribing notifications with eventfd.
>  -- Fix accounting of oom_kill events in cgroup/v2 and task/cgroup.
>  -- Fix segfault when slurmd reports less than configured gres with links after
>     a slurmctld restart.
>  -- Fix TRES counts after node is deleted using scontrol.
>  -- sched/backfill - properly handle multi-reservation HetJobs.
>  -- sched/backfill - don't try to start HetJobs after system state change.
>  -- openapi/v0.0.38 - add submission of job->prefer value.
>  -- slurmdbd - become SlurmUser at the same point in logic as slurmctld to match
>     plugins initialization behavior. This avoids a fatal error when starting
>     slurmdbd as root and root cannot start the auth or accounting_storage
>     plugins (for example, if root cannot read the jwt key).
>  -- Fix memory leak when attempting to update a job's features with invalid
>     features.
>  -- Fix occasional slurmctld crash or hang in backfill due to invalid pointers.
>  -- Fix segfault on Cray machines if cgroup cpuset is used in cgroup/v1.



More information about the slurm-announce mailing list