[slurm-users] Slurm version 21.08.3 is now available
Tim Wickberg
tim at schedmd.com
Tue Nov 2 20:50:29 UTC 2021
We are pleased to announce the availability of Slurm version 21.08.3.
This includes a number of fixes since the last release a month ago,
including one critical fix to prevent a communication issue between
slurmctld and slurmdbd for sites that have started using the new
AccountingStoreFlags=job_script functionality.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
- Tim
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
> * Changes in Slurm 21.08.3
> ==========================
> -- Return error to sacctmgr when running 'sacctmgr archive load' and the load
> fails due to an invalid or corrupted file.
> -- slurmctld/gres_ctld - fix deallocation of typed GRES without device.
> -- scrontab - fix capturing the cronspec request in the job script.
> -- openapi/dbv0.0.37 - Add missing method POST for /associations/.
> -- If ALTER TABLE was already run, continue with database upgrade.
> -- slurmstepd - Gracefully handle RunTimeQuery returning no output.
> -- srun - automatically handle issues with races to listen() on an ephemeral
> socket, and suppress otherwise needless error messages.
> -- Schedule sooner after Epilog completion with SchedulerParameters=defer.
> -- Improve performance for AccountingStoreFlags=job_env.
> -- Expose missing SLURMD_NODENAME and SLURM_NODEID to TaskEpilog environment.
> -- Bring slurm_completion.sh up to date with changes to commands.
> -- Fix issue where burst buffer stage-in could only start for one job in a job
> array per scheduling cycle instead of bb_array_stage_cnt jobs per scheduling
> cycle.
> -- Fix checking if the dependency is the same job for array jobs.
> -- Fix checking for circular dependencies with job arrays.
> -- Restore dependent job pointers on slurmctld startup to avoid race.
> -- openapi/v0.0.37 - Allow strings for JobIds instead of only numerical JobIds
> for GET, DELETE, and POST job methods.
> -- openapi/dbv0.0.36 - Gracefully handle missing associations.
> -- openapi/dbv0.0.36 - Avoid restricting job association lookups to only
> default associations.
> -- openapi/dbv0.0.37 - Gracefully handle missing associations.
> -- openapi/dbv0.0.37 - Avoid restricting job association lookups to only
> default associations.
> -- Fix error in GPU frequency validation logic.
> -- Fix regression in 21.08.1 that broke federated jobs.
> -- Correctly handle requested GRES when used in job arrays.
> -- Fix error in pmix logic dealing with the incorrect size of buffer.
> -- Fix handling of no_consume GRES, add it to allocated job allocated TRES.
> -- Fix issue with typed GRES without Files= (bitmap).
> -- Fix job_submit/lua support for 'gres' which is now stored as a 'tres'
> when requesting jobs so needs a 'gres' prefix.
> -- Fix regression where MPS would not deallocate from the node properly.
> -- Fix --gpu-bind=verbose to work correctly.
> -- Do not deny --constraint with special operators "[]()|*" when no changeable
> features are requested, but continue to deny --constraint with special
> operators when changeable features are requested.
> -- openapi/v0.0.{35,36,37} - prevent merging the slurmrestd environment
> alongside a new job submission.
> -- openapi/dbv0.0.36 - Correct tree position of dbv0.0.36_job_step.
> -- openapi/dbv0.0.37 - Correct tree position of dbv0.0.37_job_step.
> -- openapi/v0.0.37 - enable job priority field for job submissions and updates.
> -- openapi/v0.0.37 - request node states query includes MIXED state instead of
> only allocated.
> -- mpi/pmix - avoid job hanging until the time limit on PMIx agent failures.
> -- Correct inverted logic where reduced version matching applied to non-SPANK
> plugins where it should have only applied to SPANK plugins.
> -- Fix issues where prologs would run in serial without PrologFlags=serial.
> -- Make sure a job coming in is initially considered for magnetic reservations.
> -- PMIx v1.1.4 and below are no longer supported.
> -- Add comment to service files about disabling logging through journald.
> -- Add SLURM_NODE_ALIASES env to RPC Prolog (PrologFlags=alloc) environment.
> -- Limit max_script_size to 512 MB.
> -- Fix shutdown of slurmdbd plugin to correctly notice when the agent thread
> finishes.
> -- slurmdbd - fix issue with larger batch script files being sent to SlurmDBD
> with AccountingStoreFlags=job_script that can lead to accounting data loss
> as the resulting RPC generated can exceed internal limits and won't be
> sent, preventing further communication with SlurmDBD.
> This issue is indicated by "error: Invalid msg_size" in your log files.
> -- Fix compile issue with --without-shared-libslurm.
More information about the slurm-users
mailing list