[slurm-announce] Slurm version 22.05.4 is now available
Tim Wickberg
tim at schedmd.com
Thu Sep 29 20:58:58 UTC 2022
We are pleased to announce the availability of Slurm version 22.05.4.
This includes fixes to two potential crashes in the backfill scheduler,
alongside a number of other moderate severity issues.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
- Tim
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
> * Changes in Slurm 22.05.4
> ==========================
> -- Fix return code from salloc when the job is revoked prior to executing user
> command.
> -- Fix minor memory leak when dealing with gres with multiple files.
> -- Fix printing for no_consume gres in scontrol show job.
> -- sinfo - Fix truncation of very large values when outputting memory.
> -- Fix multi-node step launch failure when nodes in the controller aren't in
> natural order. This can happen with inconsistent node naming (such as
> node15 and node052) or with dynamic nodes which can register in any order.
> -- job_container/tmpfs - Prevent reading the plugin config multiple times per
> step.
> -- Fix wrong attempt of gres binding for gres w/out cores defined.
> -- Fix build to work with '--without-shared-libslurm' configure flag.
> -- Fix power_save mode when repeatedly configuring too fast.
> -- Fix sacct -I option.
> -- Prevent jobs from being scheduled on future nodes.
> -- Fix memory leak in slurmd happening on reconfigure when CPUSpecList used.
> -- Fix sacctmgr show event [min|max]cpus.
> -- Fix regression in 22.05.0rc1 where a prolog or epilog that redirected stdout
> to a file could get erroneously killed, resulting in job launch failure
> (for the prolog) and the node being drained.
> -- cgroup/v1 - Make a static variable to remove potential redundant checking
> for if the system has swap or not.
> -- cgroup/v1 - Add check for swap when running OOM check after task
> termination.
> -- job_submit/lua - add --prefer support
> -- cgroup/v1 - fix issue where sibling steps could incorrectly be accounted as
> OOM when step memory limit was the same as the job allocation. Detect OOM
> events via memory.oom_control oom_kill when exposed by the kernel instead of
> subscribing notifications with eventfd.
> -- Fix accounting of oom_kill events in cgroup/v2 and task/cgroup.
> -- Fix segfault when slurmd reports less than configured gres with links after
> a slurmctld restart.
> -- Fix TRES counts after node is deleted using scontrol.
> -- sched/backfill - properly handle multi-reservation HetJobs.
> -- sched/backfill - don't try to start HetJobs after system state change.
> -- openapi/v0.0.38 - add submission of job->prefer value.
> -- slurmdbd - become SlurmUser at the same point in logic as slurmctld to match
> plugins initialization behavior. This avoids a fatal error when starting
> slurmdbd as root and root cannot start the auth or accounting_storage
> plugins (for example, if root cannot read the jwt key).
> -- Fix memory leak when attempting to update a job's features with invalid
> features.
> -- Fix occasional slurmctld crash or hang in backfill due to invalid pointers.
> -- Fix segfault on Cray machines if cgroup cpuset is used in cgroup/v1.
More information about the slurm-announce
mailing list