[slurm-users] Slurm version 18.08.4 is now available

Tim Wickberg tim at schedmd.com
Tue Dec 11 16:00:25 MST 2018


We are pleased to announce the availability of Slurm version 18.08.4.

This includes over 70 fixes since 18.08.3 was released in October.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 18.08.4
> ==========================
>  -- burst_buffer/cray - avoid launching a job that would be immediately
>     cancelled due to a DataWarp failure.
>  -- Fix message sent to user to display preempted instead of time limit when
>     a job is preempted.
>  -- Fix memory leak when a failure happens processing a nodes gres config.
>  -- Improve error message when failures happen processing a nodes gres config.
>  -- When building rpms ignore redundant standard rpaths and insecure relative
>     rpaths, for RHEL based distros which use "check-rpaths" tool.
>  -- Don't skip jobs in scontrol hold.
>  -- Avoid locking the job_list when unneeded.
>  -- Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable.
>  -- Make it so fixing runaway jobs will not alter the same job requeued
>     when not runaway.
>  -- Avoid checking state when searching for runaway jobs.
>  -- Remove redundant check for end time of job when searching for runaway jobs.
>  -- Make sure that we properly check for runawayjobs where another job might
>     have the same id (for example, if a job was requeued) by also checking the
>     submit time.
>  -- Add scontrol update job ResetAccrueTime to clear a job's time
>     previously accrued for priority.
>  -- cons_res: Delay exiting cr_job_test until after cores/cpus are calculated
>     and distributed.
>  -- Fix bug where binary in cwd would trump binary in PATH with test_exec.
>  -- Fix check to test printf("%s\n", NULL); to not require
>     -Wno-format-truncation CFLAG.
>  -- Fix JobAcctGatherParams=UsePss to report the correct usage.
>  -- Fix minor memory leak in pmix plugin.
>  -- Fix minor memory leak in slurmctld when reading configuration.
>  -- Handle return codes correctly from pthread_* functions.
>  -- Fix minor memory leak when a slurmd is unable to contact a slurmctld
>     when trying to register.
>  -- Fix sreport sizesbyaccount report when using Flatview and accounts.
>  -- Fix incorrect shift when dealing with node weights and scheduling.
>  -- libslurm/perl - Fix segfault caused by incorrect hv_to_slurm_ctl_conf.
>  -- Add qos and assoc options to confirmation dialogs.
>  -- Handle updating identical license or partition information correctly.
>  -- Makes sure accounts and QOS' are all lower case to match documentation
>     when read in from the slurm.conf file.
>  -- Don't consider partitions without enough nodes in reservation,
>     main scheduler.
>  -- Set SLURM_NTASKS correctly if having to determine from other options.
>  -- Removed GCP scripts from contribs. Now located at:
>     https://github.com/SchedMD/slurm-gcp.
>  -- Don't check existence of srun --prolog or --epilog executables when set to
>     "none" and SLURM_TEST_EXEC is used.
>  -- Add "P" suffix support to job and step tres specifications.
>  -- When doing a reconfigure handle QOS' GrpJobsAccrue correctly.
>  -- Remove unneeded extra parentheses from sh5util.
>  -- Fix jobacct_gather/cgroup to work correctly when more than one task is
>     started on a node.
>  -- If requesting --ntasks-per-node with no tasks set tasks correctly.
>  -- Accept modifiers for TRES originally added in 6f0342e0358.
>  -- Don't remove reservation on slurmctld restart if nodes are removed from
>     configuration.
>  -- Fix bad xfree in task/cgroup.
>  -- Fix removing counters if a job array isn't subject to limits and is
>     canceled while pending.
>  -- Make sure SLURM_NTASKS_PER_NODE is set correctly when env is overwritten
>     by the command line.
>  -- Clean up step on a failed node correctly.
>  -- mpi/pmix: Fixed the logging of collective state.
>  -- mpi/pmix: Make multi-slurmd work correctly when using ring communication.
>  -- mpi/pmix: Fix double invocation of the PMIx lib fence callback.
>  -- mpi/pmix: Remove unneeded libpmix callback drop in tree-based coll.
>  -- Fix race condition in route/topology when the slurmctld is reconfigured.
>  -- In route/topology validate the slurmctld doesn't try to initialize the
>     node system.
>  -- Fix issue when requesting invalid gres.
>  -- Validate job_ptr in backfill before restoring preempt state.
>  -- Fix issue when job's environment is minimal and only contains variables
>     Slurm is going to replace internally.
>  -- When handling runaway jobs remove all usage before rollup to remove any
>     time that wasn't existent instead of just updating lines that have time
>     with a lesser time.
>  -- salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the
>     environment if the corresponding command line options are used.
>  -- slurmd - fix handling of the -f flag to specify alternate config file
>     locations.
>  -- Fix scheduling logic to avoid using nodes that require a reboot for KNL
>     node change when possible.
>  -- Fix scheduling logic bug. There should have been a test for _not_
>     NODE_SET_REBOOT to continue.
>  -- Fix a scheuling logic bug with respect to XOR operation support when there
>     are down nodes.
>  -- If there is a constraint construct of the form "[...&...]"
>     then an error is generated if more than one of those specifications
>     contains KNL NUMA or MCDRAM modes.
>  -- Fix stepd segfault race if slurmctld hasn't registered with the launching
>     slurmd yet delivering it's TRES list.
>  -- Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid
>     scheduling lower priority jobs on resources that become available during
>     the backfill scheduling cycle when bf_continue is enabled.
>  -- Decrement message_connections in stepd code on error path correctly.
>  -- Decrease an error message to be debug.
>  -- Fix missing suffixes in squeue.
>  -- pam_slurm_adopt - send an error message to the user if no Slurm jobs
>     can be located on the node.
>  -- Run SlurmctldPrimaryOffProg when the primary slurmctld process shuts down.
>  -- job_submit/lua: Add several slurmctld return codes.
>  -- job_submit/lua: Add user/group info to jobs.
>  -- Fix formatting issues when printing uint64_t.
>  -- Bump RLIMIT_NOFILE for daemons in systemd services.
>  -- Expand %x in job name in 'scontrol show job'.
>  -- salloc/sbatch/srun - print warning if mutually exclusive options of --mem
>     and --mem-per-cpu are both set.



More information about the slurm-users mailing list