[slurm-announce] Slurm version 21.08.2 is now available

Tue Oct 5 22:56:39 UTC 2021

We are pleased to announce the availability of Slurm version 21.08.2.

There is one significant change include in this maintenance release: the 
removal of support for the long-misunderstood TaskAffinity=yes option in 
cgroup.conf. Please consider using "TaskPlugins=cgroup,affinity" in 
slurm.conf as an option.

Unfortunately a number of issues identified where the processor affinity 
settings from this now-unsupported approach would be calculated 
incorrectly, leading to potential performance issues.

SchedMD had been previously planning to remove this support in the next 
22.05 release, but a number of issues reported after the cgroup code 
refactoring have led us to remove this now, rather than try to correct 
issues with what has not been a recommended configuration for some time.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 21.08.2
> ==========================
>  -- slurmctld - fix how the max number of cores on a node in a partition are
>     calculated when the partition contains multi-socket nodes. This in turn
>     corrects certain jobs node count estimations displayed client-side.
>  -- job_submit/cray_aries - fix "craynetwork" GRES specification after changes
>     introduced in 21.08.0rc1 that made TRES always have a type prefix.
>  -- Ignore nonsensical check in the slurmd for [Pro|Epi]logSlurmctld.
>  -- Fix writing to stderr/syslog when systemd runs slurmctld in the foreground.
>  -- Fix locking around log level setting routines.
>  -- Fix issue with updating job started with node range.
>  -- Fix issue with nodes not clearing state in the database when the slurmctld
>     is started with clean-start.
>  -- Fix hetjob components > 1 timing out due to InactiveLimit.
>  -- Fix sprio printing -nan for normalized association priority if
>     PriorityWeightAssoc was not defined.
>  -- Disallow FirstJobId=0.
>  -- Preserve job start info in the database for a requeued job that hadn't
>     registered the first time in the database yet.
>  -- Only send one message on prolog failure from the slurmd.
>  -- Remove support for TaskAffinity=yes in cgroup.conf.
>  -- accounting_storage/mysql - fix issue where querying jobs via sacct
>     --whole-hetjob=yes or slurmrestd (which automatically includes this flag)
>     could in some cases return more records than expected.
>  -- Fix issue for preemption of job array task that makes afterok dependency
>     fail. Additionally, send emails when requeueing happens due to preemption.
>  -- Fix sending requeue mail type.
>  -- Properly resize a job's GRES bitmaps and counts when resizing the job.
>  -- Fix node being able to transition to CLOUD state from non-cloud state.
>  -- Fix regression introduced in 21.08.0rc1 which broke a step's ability to
>     inherit GRES from the job when the step didn't request GRES but the job did.
>  -- Fix errors in logic when picking nodes based on bracketed anded constraints.
>     This also enforces the requirement to have a count when using such
>     constraints.
>  -- Handle job resize better in the database.
>  -- Exclude currently running, resized jobs from the runaway jobs list.
>  -- Make it possible to shrink a job more than once.