[slurm-users] Slurm version 19.05.3 is now available
Tim Wickberg
tim at schedmd.com
Thu Oct 3 19:43:47 UTC 2019
Slurm version 19.05.3 is now available, and includes a series of fixes
since 19.05.2 was released nearly two months ago.
Downloads are available at https://www.schedmd.com/downloads.php .
Release notes follow below.
- Tim
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
> * Changes in Slurm 19.05.3
> ==========================
> -- Fix missing check from conversion of cray -> cray_aries.
> -- Improve job state reason string when required nodes are not available by
> not including those that don't belong to the job partition.
> -- Set a more appropriate ESLURM_RESERVATION_MAINT job state reason for jobs
> requesting feature(s) and required nodes are in a maintenance reservation.
> -- Fix logic to better handle maintenance reservations.
> -- Add spank options to cache in remote callback.
> -- Enforce the use of spank_option_getopt().
> -- Fix select plugins' will run test under-allocating nodes usage for
> completing jobs.
> -- Nodes in COMPLETING state treated as being currently available for job
> will-run test.
> -- Cray - fix contribs slurm.conf.j2 with updated cray_aries plugin names.
> -- job_submit/lua - fix problem where nil was expected for min_mem_per_cpu.
> -- Fix extra, unaccounted TRESRunMins usage created by heterogeneous jobs when
> running with the priority/multifactor plugin.
> -- Detach threads once they are done to avoid having to join them
> in track scripts code.
> -- Handle situation where a slurmctld tries to communicate with slurmdbd more
> than once at the same time.
> -- Fix XOR/XAND features like cpu&fastio&[knl|westmere] to be resolved
> correctly.
> -- Don't update [min|max]_exit_code on job array task requeue.
> -- Don't assume the first node of a job is the batch host when testing if the
> job's allocated nodes are booted/ready.
> -- Make --batch=<feature> requests wait for all nodes to be booted so that it
> can choose the batch host after the nodes have been booted -- possibly with
> different features.
> -- Fix talking to batch host on it's protocol version when using --batch.
> -- gres/mic plugin - add missing fini() function to clean up plugin state.
> -- Move _validate_node_choice() before prolog/epilog check.
> -- Look forward one week while create new reservation.
> -- Set mising resv_desc.flags before call _select_nodes().
> -- Use correct start_time for TIME_FLOAT reservation in _job_overlap().
> -- Properly enforce a job's mem-per-cpu option when allocate the node
> exclusively to that job.
> -- sched/backfill - clear estimated sched_nodes as done for start_time.
> -- Have safe_[read|write] handle EAGAIN and EINTR.
> -- Fix checking for flag with logical AND.
> -- Correct "extern" definition of variable if compiling with __APPLE__.
> -- Deprecate FastSchedule. FastSchedule will be removed in 20.02.
> The FastSchedule=2 functionality (used for testing and development) has
> been retained as the new SlurmdParameters=config_overrides option.
> -- Fix preemption issue when picking nodes for a feature job request.
> -- Fix race condition preventing held array job from getting a db_index.
> -- Fix select/cons_tres gres code infinite loop leaving slurmctld unresponsive.
> -- Remove redefinition of global variable in gres.c
> -- Fix issue where GPU devices are denied access when MPS is enabled.
> -- Fix uninitialized errors when compiling with CFLAGS="--coverage".
> -- Fix scancel --full for proctrack/cgroups.
> -- Fix sdiag backfill last and mean queue length stats.
> -- Do not remove batch host when resizing/shrinking a batch job.
> -- nss_slurm - fix file descriptor leaks.
> -- Fix preemption for jobs using complex feature requests
> (e.g. -C "[rack1*2&rack2*4]").
> -- Fix memory leaks in preemption when jobs request multiple features.
> -- Allow Operator users to show/fix runaways.
> -- Disallow coordinators to show/fix runaways.
> -- mpi/pmi2 - increase array len to avoid buffer size exceeded error.
> -- Preserve rebooting node's nextstate when updating state with scontrol.
> -- Fully merge slurm.conf and gres.conf before node_config_load().
> -- Remove FastSchedule dependence from gres.conf's AutoDetect=nvml.
> -- Forbid mix of typed and untyped GRES of same name in slurm.conf.
> -- cons_tres: Prevent creating a job without CPUs.
> -- Prevent underflow when filtering cores with gres.
> -- proctrack/cray_aries: use current pid instead of thread if we're in a fork.
> -- Fix missing check for prolog launch credential creation failure that can
> lead to segfaults
More information about the slurm-users
mailing list