[slurm-users] Slurm version 21.08.3 is now available

Tim Wickberg tim at schedmd.com
Tue Nov 2 20:50:29 UTC 2021

We are pleased to announce the availability of Slurm version 21.08.3.

This includes a number of fixes since the last release a month ago, 
including one critical fix to prevent a communication issue between 
slurmctld and slurmdbd for sites that have started using the new 
AccountingStoreFlags=job_script functionality.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 21.08.3
> ==========================
>  -- Return error to sacctmgr when running 'sacctmgr archive load' and the load
>     fails due to an invalid or corrupted file.
>  -- slurmctld/gres_ctld - fix deallocation of typed GRES without device.
>  -- scrontab - fix capturing the cronspec request in the job script.
>  -- openapi/dbv0.0.37 - Add missing method POST for /associations/.
>  -- If ALTER TABLE was already run, continue with database upgrade.
>  -- slurmstepd - Gracefully handle RunTimeQuery returning no output.
>  -- srun - automatically handle issues with races to listen() on an ephemeral
>     socket, and suppress otherwise needless error messages.
>  -- Schedule sooner after Epilog completion with SchedulerParameters=defer.
>  -- Improve performance for AccountingStoreFlags=job_env.
>  -- Expose missing SLURMD_NODENAME and SLURM_NODEID to TaskEpilog environment.
>  -- Bring slurm_completion.sh up to date with changes to commands.
>  -- Fix issue where burst buffer stage-in could only start for one job in a job
>     array per scheduling cycle instead of bb_array_stage_cnt jobs per scheduling
>     cycle.
>  -- Fix checking if the dependency is the same job for array jobs.
>  -- Fix checking for circular dependencies with job arrays.
>  -- Restore dependent job pointers on slurmctld startup to avoid race.
>  -- openapi/v0.0.37 - Allow strings for JobIds instead of only numerical JobIds
>     for GET, DELETE, and POST job methods.
>  -- openapi/dbv0.0.36 - Gracefully handle missing associations.
>  -- openapi/dbv0.0.36 - Avoid restricting job association lookups to only
>     default associations.
>  -- openapi/dbv0.0.37 - Gracefully handle missing associations.
>  -- openapi/dbv0.0.37 - Avoid restricting job association lookups to only
>     default associations.
>  -- Fix error in GPU frequency validation logic.
>  -- Fix regression in 21.08.1 that broke federated jobs.
>  -- Correctly handle requested GRES when used in job arrays.
>  -- Fix error in pmix logic dealing with the incorrect size of buffer.
>  -- Fix handling of no_consume GRES, add it to allocated job allocated TRES.
>  -- Fix issue with typed GRES without Files= (bitmap).
>  -- Fix job_submit/lua support for 'gres' which is now stored as a 'tres'
>     when requesting jobs so needs a 'gres' prefix.
>  -- Fix regression where MPS would not deallocate from the node properly.
>  -- Fix --gpu-bind=verbose to work correctly.
>  -- Do not deny --constraint with special operators "[]()|*" when no changeable
>     features are requested, but continue to deny --constraint with special
>     operators when changeable features are requested.
>  -- openapi/v0.0.{35,36,37} - prevent merging the slurmrestd environment
>     alongside a new job submission.
>  -- openapi/dbv0.0.36 - Correct tree position of dbv0.0.36_job_step.
>  -- openapi/dbv0.0.37 - Correct tree position of dbv0.0.37_job_step.
>  -- openapi/v0.0.37 - enable job priority field for job submissions and updates.
>  -- openapi/v0.0.37 - request node states query includes MIXED state instead of
>     only allocated.
>  -- mpi/pmix - avoid job hanging until the time limit on PMIx agent failures.
>  -- Correct inverted logic where reduced version matching applied to non-SPANK
>     plugins where it should have only applied to SPANK plugins.
>  -- Fix issues where prologs would run in serial without PrologFlags=serial.
>  -- Make sure a job coming in is initially considered for magnetic reservations.
>  -- PMIx v1.1.4 and below are no longer supported.
>  -- Add comment to service files about disabling logging through journald.
>  -- Add SLURM_NODE_ALIASES env to RPC Prolog (PrologFlags=alloc) environment.
>  -- Limit max_script_size to 512 MB.
>  -- Fix shutdown of slurmdbd plugin to correctly notice when the agent thread
>     finishes.
>  -- slurmdbd - fix issue with larger batch script files being sent to SlurmDBD
>     with AccountingStoreFlags=job_script that can lead to accounting data loss
>     as the resulting RPC generated can exceed internal limits and won't be
>     sent, preventing further communication with SlurmDBD.
>     This issue is indicated by "error: Invalid msg_size" in your log files.
>  -- Fix compile issue with --without-shared-libslurm.

More information about the slurm-users mailing list