[slurm-users] Slurm versions 20.02.6 and 19.05.8 are now available (CVE-2020-27745 and CVE-2020-27746)

Tim Wickberg tim at schedmd.com
Thu Nov 12 17:49:58 UTC 2020


Slurm versions 20.11.0rc2, 20.02.6 and 19.05.8 are now available, and 
include a series of recent bug fixes, as well as a fix for two security 
issues.

Note: the 19.05 release series is nearing the end of it's support 
lifecycle as we prepare to release 20.11 later this month. The 19.05.8 
download link is under the 'Older Versions' page.

SchedMD customers were informed on October 29th and provided patches on 
request; this process is documented in our security policy [1].

CVE-2020-27745:
A review of Slurm's RPC handling code uncovered a potential buffer 
overflow with one utility function. The only affected use is in Slurm's 
PMIx MPI plugin, and a job would only be vulnerable if --mpi=pmix was 
requested, or the site has set MpiDefault=pmix in slurm.conf.

CVE-2020-27746:
Slurm's use of the 'xauth' command to manage X11 magic cookies can lead 
to an inadvertent disclosure of a user's cookie when setting up X11 
forwarding on a node. An attacker monitoring /proc on the node could 
race the setup and steal the magic cookie, which may let them connect to 
that user's X11 session. A job would only be impacted if --x11 was 
requested at submission time. This was reported by Jonas Stare (NSC).

Downloads are available at https://www.schedmd.com/downloads.php .

Release notes follow below.

- Tim

[1] https://www.schedmd.com/security.php

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 20.11.0rc2
> ==============================
>  -- MySQL - Remove potential race condition when sending updates to a cluster
>     and commit_delay used.
>  -- Fixed regression in rc1 where sinfo et al would not show a node in a resv
>     state.
>  -- select/linear will now allocate up to nodes RealMemory when configured with
>     SelectTypeParameters=CR_Memory and --mem=0 specified. Previous behavior was
>     no memory accouted and no memory limits implied to job.
>  -- Remove unneeded lock check from running the slurmctld prolog for a job.
>  -- Fix duplicate key error on clean starts after slurmctld is killed.
>  -- Avoid double free of step_record_t in the slurmctld when node is removed
>     from config.
>  -- Zero out step_record_t's magic when freed.
>  -- Fix sacctmgr clearing QosLevel when trailing comma is used.
>  -- slurmrestd - fix a fatal() error when connecting over IPv6.
>  -- slurmrestd - add API to interface with slurmdbd.
>  -- mpi/cray_shasta - fix PMI port parsing for non-contiguous port ranges.
>  -- squeue and sinfo -O no longer repeat the last suffix specified.
>  -- cons_tres - fix regression regarding gpus with --cpus-per-task.
>  -- Avoid non-async-signal-safe functions calls in X11 fowarding which can
>     lead to the extern step terminating unexpectedly.
>  -- Don't send job completion email for revoked federation jobs.
>  -- Fix device or resource busy errors on cgroup cleanup on older kernels.
>  -- Avoid binding to IPv6 wildcard address in slurmd if IPv6 is not explicitly
>     enabled.
>  -- Make ntasks_per_gres work with cpus_per_task.
>  -- Various alterations in reference to ntasks_per_tres.
>  -- slurmrestd - multiple changes to make Slurm's OpenAPI spec compatible with
>     https://openapi-generator.tech/.
>  -- nss_slurm - avoid loading slurm.conf to avoid issues on configless systems,
>     or systems with config files loaded on shared storage.
>  -- scrontab - add cli_filter hooks.
>  -- job_submit/lua - expose a "cron_job" flag to identify jobs submitted
>     through scrontab.
>  -- PMIx - fix potential buffer overflows from use of unpackmem().
>     CVE-2020-27745.
>  -- X11 forwarding - fix potential leak of the magic cookie when sent as an
>     argument to the xauth command. CVE-2020-27746.

> * Changes in Slurm 20.02.6
> ==========================
>  -- Fix sbcast --fanout option.
>  -- Tighten up keyword matching for --dependency.
>  -- Fix "squeue -S P" not sorting by partition name.
>  -- Fix segfault in slurmctld if group resolution fails during job credential
>     creation.
>  -- sacctmgr - Honor PreserveCaseUser when creating users with load command.
>  -- Avoid attempting to schedule jobs on magnetic reservations when they aren't
>     allowed.
>  -- Always make sure we clear the magnetic flag from a job.
>  -- In backfill avoid NULL pointer dereference.
>  -- Fix Segfault at end of slurmctld if you have a magnetic reservation and
>     you shutdown the slurmctld.
>  -- Silence security warning when a Slurm is trying a job for a
>     magnetic reservation.
>  -- Have sacct exit correctly when a user/group id isn't valid.
>  -- Remove extra \n from invalid user/group id error message.
>  -- Detect when extern steps trigger OOM events and mark extern step correctly.
>  -- pam_slurm_adopt - permit root access to the node before reading the config
>     file, which will give root a chance to fix the config if missing or broken.
>  -- Reset DefMemPerCPU, MaxMemPerCPU, and TaskPluginParam (among other minor
>     flags) on reconfigure.
>  -- Fix incorrect memory handling of mail_user when updating mail_type=none.
>  -- Handle mail_user and mail_type independently.
>  -- Fix thread-safety issue with assoc_mgr_get_admin_level().
>  -- Ignore step features if equal to job features
>  -- Fix slurmstepd segfault caused by incorrect strtok() usage.
>  -- CRAY - Remove unneeded ATP spank plugin from ansible playbook.
>  -- Fix core selection for exclusive step on nodes where CPUs == Cores.
>  -- Fix topology aware scheduling reservations.
>  -- Fix loading cpus_per_task on a job from state file.
>  -- When a partition has no nodes fix estimate of max cpus possible on a job
>     trying to run there.
>  -- In cons_tres fix sorting functions to handle node/topo weight
>     correctly.
>  -- Fix regression in 20.02.5 where you couldn't request contraints with a
>     simple & and a count.
>  -- Limit the number of threads for servicing emails.
>  -- Avoid possible double init race condition in assoc_mgr_lock().
>  -- Add missing locks in slurm_cred_handle_reissue().
>  -- Add missing locks in slurm_cred_revoked().
>  -- Fix slurmctld segfault due to tight reconfigure RPC requests by serializing
>     the RPC handler processing logic.
>  -- Use _exit() instead exit() after fork().
>  -- Perl API - fix hang reading config in configless environments.
>  -- slurmrestd - request detailed node information to populate GRES fields.
>  -- slurmrestd - request detailed job information to populate GRES fields.
>  -- Fix job license update bug on array tasks or hetjob components.
>  -- Fix job partition update bug on array tasks or hetjob components.
>  -- Fix slurmctld segfault on _pick_best_nodes() when processing a job request
>     with XOR'd constraints and no nodeset has the feature.
>  -- Fix job requests rejected with incorrect NODE_CONFIG_UNAVAIL when nodes are
>     actually only busy due to an overlapping MAINT reservation.
>  -- Fix sacctmgr allowing the deletion of a user's default account.
>  -- Fix srun and other Slurm commands running within a "configless" salloc
>     terminal.
>  -- MySQL - Correctly handle QOS deletion from assocation tables.
>  -- Fix update of First_Cores flag in a reservation.
>  -- Fix parsing of update reservation flags.
>  -- Fix --switches for cons_tres.
>  -- Retry connection on ETIMEDOUT in slurm_send_addr_recv_msgs.
>  -- Fix wait for RPC_PROLOG_LAUNCH notification 2*MessageTimeout.
>  -- Have slurm_send_addr_recv_msgs conn_timeout to match rpc_wait in slurmd.
>  -- pam_slurm_adopt - operate correctly even if ConstrainRAMSpace is not
>     enabled on the node by falling back to the cpuset, devices, or freezer
>     subsystem instead.
>  -- slurmrestd - use memmove() instead of memcpy() in string manipulation
>     to avoid bugs related to overlapping memory regions.
>  -- slurmrestd - avoid xassert() failure on duplicated headers in request.
>  -- Remove stale 'ReqNodeNotAvail, Reserved for maintenance' message from
>     pending jobs after a maintenance reservation ended.
>  -- MySQL - Stop steps from printing when outside time range. >  -- Fixed kmem limit calculation to use MaxKmemPercent correctly.
>  -- Fix initialization of cpuset.mems/cpus on uid cgroup subdir.
>  -- MySQL - Remove potential race condition when sending updates to a cluster
>     and commit_delay used.
>  -- Avoid double free of step_record_t in the slurmctld when node is removed
>     from config.
>  -- cons_tres - fix regression regarding gpus with --cpus-per-task.
>  -- Don't send job completion email for revoked federation jobs.
>  -- PMIx - fix potential buffer overflows from use of unpackmem().
>     CVE-2020-27745.
>  -- X11 forwarding - fix potential leak of the magic cookie when sent as an
>     argument to the xauth command. CVE-2020-27746.

> * Changes in Slurm 19.05.8
> ==========================
>  -- sbatch - handle --uid/--gid in #SBATCH directives properly.
>  -- Fix HDF5 type version build error.
>  -- PMIx - fix potential buffer overflows from use of unpackmem().
>     CVE-2020-27745.
>  -- X11 forwarding - fix potential leak of the magic cookie when sent as an
>     argument to the xauth command. CVE-2020-27746



More information about the slurm-users mailing list