[slurm-announce] Slurm version 18.08.4 is now available
Tim Wickberg
tim at schedmd.com
Tue Dec 11 16:00:25 MST 2018
We are pleased to announce the availability of Slurm version 18.08.4.
This includes over 70 fixes since 18.08.3 was released in October.
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
- Tim
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
> * Changes in Slurm 18.08.4
> ==========================
> -- burst_buffer/cray - avoid launching a job that would be immediately
> cancelled due to a DataWarp failure.
> -- Fix message sent to user to display preempted instead of time limit when
> a job is preempted.
> -- Fix memory leak when a failure happens processing a nodes gres config.
> -- Improve error message when failures happen processing a nodes gres config.
> -- When building rpms ignore redundant standard rpaths and insecure relative
> rpaths, for RHEL based distros which use "check-rpaths" tool.
> -- Don't skip jobs in scontrol hold.
> -- Avoid locking the job_list when unneeded.
> -- Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable.
> -- Make it so fixing runaway jobs will not alter the same job requeued
> when not runaway.
> -- Avoid checking state when searching for runaway jobs.
> -- Remove redundant check for end time of job when searching for runaway jobs.
> -- Make sure that we properly check for runawayjobs where another job might
> have the same id (for example, if a job was requeued) by also checking the
> submit time.
> -- Add scontrol update job ResetAccrueTime to clear a job's time
> previously accrued for priority.
> -- cons_res: Delay exiting cr_job_test until after cores/cpus are calculated
> and distributed.
> -- Fix bug where binary in cwd would trump binary in PATH with test_exec.
> -- Fix check to test printf("%s\n", NULL); to not require
> -Wno-format-truncation CFLAG.
> -- Fix JobAcctGatherParams=UsePss to report the correct usage.
> -- Fix minor memory leak in pmix plugin.
> -- Fix minor memory leak in slurmctld when reading configuration.
> -- Handle return codes correctly from pthread_* functions.
> -- Fix minor memory leak when a slurmd is unable to contact a slurmctld
> when trying to register.
> -- Fix sreport sizesbyaccount report when using Flatview and accounts.
> -- Fix incorrect shift when dealing with node weights and scheduling.
> -- libslurm/perl - Fix segfault caused by incorrect hv_to_slurm_ctl_conf.
> -- Add qos and assoc options to confirmation dialogs.
> -- Handle updating identical license or partition information correctly.
> -- Makes sure accounts and QOS' are all lower case to match documentation
> when read in from the slurm.conf file.
> -- Don't consider partitions without enough nodes in reservation,
> main scheduler.
> -- Set SLURM_NTASKS correctly if having to determine from other options.
> -- Removed GCP scripts from contribs. Now located at:
> https://github.com/SchedMD/slurm-gcp.
> -- Don't check existence of srun --prolog or --epilog executables when set to
> "none" and SLURM_TEST_EXEC is used.
> -- Add "P" suffix support to job and step tres specifications.
> -- When doing a reconfigure handle QOS' GrpJobsAccrue correctly.
> -- Remove unneeded extra parentheses from sh5util.
> -- Fix jobacct_gather/cgroup to work correctly when more than one task is
> started on a node.
> -- If requesting --ntasks-per-node with no tasks set tasks correctly.
> -- Accept modifiers for TRES originally added in 6f0342e0358.
> -- Don't remove reservation on slurmctld restart if nodes are removed from
> configuration.
> -- Fix bad xfree in task/cgroup.
> -- Fix removing counters if a job array isn't subject to limits and is
> canceled while pending.
> -- Make sure SLURM_NTASKS_PER_NODE is set correctly when env is overwritten
> by the command line.
> -- Clean up step on a failed node correctly.
> -- mpi/pmix: Fixed the logging of collective state.
> -- mpi/pmix: Make multi-slurmd work correctly when using ring communication.
> -- mpi/pmix: Fix double invocation of the PMIx lib fence callback.
> -- mpi/pmix: Remove unneeded libpmix callback drop in tree-based coll.
> -- Fix race condition in route/topology when the slurmctld is reconfigured.
> -- In route/topology validate the slurmctld doesn't try to initialize the
> node system.
> -- Fix issue when requesting invalid gres.
> -- Validate job_ptr in backfill before restoring preempt state.
> -- Fix issue when job's environment is minimal and only contains variables
> Slurm is going to replace internally.
> -- When handling runaway jobs remove all usage before rollup to remove any
> time that wasn't existent instead of just updating lines that have time
> with a lesser time.
> -- salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the
> environment if the corresponding command line options are used.
> -- slurmd - fix handling of the -f flag to specify alternate config file
> locations.
> -- Fix scheduling logic to avoid using nodes that require a reboot for KNL
> node change when possible.
> -- Fix scheduling logic bug. There should have been a test for _not_
> NODE_SET_REBOOT to continue.
> -- Fix a scheuling logic bug with respect to XOR operation support when there
> are down nodes.
> -- If there is a constraint construct of the form "[...&...]"
> then an error is generated if more than one of those specifications
> contains KNL NUMA or MCDRAM modes.
> -- Fix stepd segfault race if slurmctld hasn't registered with the launching
> slurmd yet delivering it's TRES list.
> -- Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid
> scheduling lower priority jobs on resources that become available during
> the backfill scheduling cycle when bf_continue is enabled.
> -- Decrement message_connections in stepd code on error path correctly.
> -- Decrease an error message to be debug.
> -- Fix missing suffixes in squeue.
> -- pam_slurm_adopt - send an error message to the user if no Slurm jobs
> can be located on the node.
> -- Run SlurmctldPrimaryOffProg when the primary slurmctld process shuts down.
> -- job_submit/lua: Add several slurmctld return codes.
> -- job_submit/lua: Add user/group info to jobs.
> -- Fix formatting issues when printing uint64_t.
> -- Bump RLIMIT_NOFILE for daemons in systemd services.
> -- Expand %x in job name in 'scontrol show job'.
> -- salloc/sbatch/srun - print warning if mutually exclusive options of --mem
> and --mem-per-cpu are both set.
More information about the slurm-announce
mailing list