[slurm-users] Slurm versions 23.11.1, 23.02.7, 22.05.11 are now available (CVE-2023-49933 through CVE-2023-49938)

Tim Wickberg tim at schedmd.com
Wed Dec 13 22:06:32 UTC 2023


Slurm versions 23.11.1, 23.02.7, 22.05.11 are now available and address 
a number of recently-discovered security issues. They've been assigned 
CVE-2023-49933 through CVE-2023-49938.

SchedMD customers were informed on November 29th and provided a patch on 
request; this process is documented in our security policy. [1]

There are no mitigations available for these issues; the only option is 
to patch and restart the affected daemons.

--------

Five issues were reported by Ryan Hall (Meta Red Team X):

1) Slurmd Message Integrity Bypass. (Slurm 23.02 and 23.11.)
    CVE-2023-49935

Permits an attacker to reuse root-level authentication tokens when 
interacting with the slurmd process, bypassing the RPC message hashes 
which protect against malicious MUNGE credential reuse.

2) Slurm Arbitrary File Overwrite. (Slurm 22.05 and 23.02.)
    CVE-2023-49938

Permits an attacker to modified their extended group list used with the 
sbcast subsystem, and open files with an incorrect set of extended groups.

3) Slurm NULL Pointer Dereference. (Slurm 22.05, 23.02, 23.11.)
    CVE-2023-49936

Denial of service.

4) Slurm Protocol Double Free. (Slurm 22.05, 23.02, 23.11.)
    CVE-2023-49937

Denial of service, potential for arbitrary code execution.

5) Slurm Protocol Message Extension. (Slurm 22.05, 23.02, 23.11.)
    CVE-2023-49933

Allows for malicious modification of RPC traffic that bypasses the 
message hash checks.

A sixth issue was discovered internally by SchedMD:

6) SQL Injection. (Slurm 23.11.)
    CVE-2023-49934

Arbitrary SQL injection against SlurmDBD's SQL database.

--------

SchedMD only issues security fixes for the supported releases (currently 
23.11, 23.02 and 22.05). Due to the complexity of these fixes, we do not 
recommend attempting to back-port the fixes to older releases, and 
strongly encourage sites to upgrade to fixed versions immediately.

Downloads are available at https://www.schedmd.com/downloads.php .

Release notes follow below.

- Tim

[1] https://www.schedmd.com/security.php

-- 
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 23.11.1
> ==========================
>  -- Fix scontrol update job=... TimeLimit+=/-= when used with a raw JobId of job
>     array element.
>  -- Reject TimeLimit increment/decrement when called on job with
>     TimeLimit=UNLIMITED.
>  -- Fix slurmctld segfault when reconfiguring after a job resize.
>  -- Fix compilation on FreeBSD.
>  -- Fix issue with requesting a job with --licenses as well as
>     --tres-per-task=license.
>  -- slurmctld - Prevent segfault in getopt_long() with an invalid long option.
>  -- Switch to man2html-base in Build-Depends for Debian package.
>  -- slurmrestd - Added /meta/slurm/cluster field to responses.
>  -- Adjust systemd service files to start daemons after remote-fs.target.
>  -- Add "--with selinux" option to slurm.spec.
>  -- Fix task/cgroup indexing tasks in cgroup plugins, which caused
>     jobacct/gather to match the gathered stats with the wrong task id.
>  -- select/linear - Fix regression in 23.11 in which jobs that requested
>     --cpus-per-task were rejected.
>  -- Fix crash in slurmstepd that can occur when launching tasks via mpi using
>     the pmi2 plugin and using the route/topology plugin.
>  -- Fix sgather not gathering from all nodes when using CR_PACK_NODES/--m pack.
>  -- Fix mysql query syntax error when getting jobs with private data.
>  -- Fix sanity check to prevent deleting default account of users.
>  -- data_parser/v0.0.40 - Fix the parsing for /slurmdb/v0.0.40/jobs exit_code
>     query parameter.
>  -- Fix issue where TRES for energy wasn't always set before sending it to the
>     jobcomp plugin.
>  -- jobcomp/[kafka|elastisearch] Print raw TRES values along with the
>     formatted versions as tres_[req|alloc]_raw.
>  -- Fix inconsistencies with --cpu-bind/SLURM_CPU_BIND and --hint/SLURM_HINT.
>  -- Fix ignoring invalid json in various subsystems.
>  -- Remove shebang from bash completion script.
>  -- Fix elapsed time in JobComp being set from invalid start and end times.
>  -- Update service files to start slurmd, slurmctld, and slurmdbd after sssd.
>  -- data_parser/v0.0.40 - Fix output of DefMemPerCpu, MaxMemPerCpu, and
>     max_shares.
>  -- When determining a jobs index in the database don't wait if there are more
>     jobs waiting.
>  -- If a job requests more shards which would allocate more than one sharing
>     GRES (gpu) per node refuse it unless SelectTypeparameters has
>     MULTIPLE_SHARING_GRES_PJ.
>  -- Avoid refreshing the hwloc xml file when slurmd is reconfigured. This fixes
>     an issue seen with CoreSpecCount used on nodes with Intel E-cores.
>  -- Trigger fatal exit when Slurm API function is called before slurm_init() is
>     called.
>  -- slurmd - Fix issue with 'scontrol reconfigure' when started with '-c'.
>  -- data_parser/v0.0.40 - Fix handling of negative job nice values.
>  -- data_parser/v0.0.40 - Fill the "id" object for associations with the
>     cluster, account, partition, and user in addition to the assoc id.
>  -- data_parser/v0.0.40 - Remove unusable cpu_binding_flags enums from
>     v00.0.40_job_desc_msg.
>  -- Improve performance and resiliency of slurmscriptd shutdown on
>     'scontrol reconfigure'.
>  -- slurmrestd - Job submissions that result in the following error codes
>     will be considered as successfully submitted (with a warning), instead
>     of returning an HTTP 500 error back:
>     ESLURM_NODES_BUSY, ESLURM_RESERVATION_BUSY, ESLURM_JOB_HELD,
>     ESLURM_NODE_NOT_AVAIL, ESLURM_QOS_THRES, ESLURM_ACCOUNTING_POLICY,
>     ESLURM_RESERVATION_NOT_USABLE, ESLURM_REQUESTED_PART_CONFIG_UNAVAILABLE,
>     ESLURM_BURST_BUFFER_WAIT, ESLURM_PARTITION_DOWN,
>     ESLURM_LICENSES_UNAVAILABLE.
>  -- Fix issue with node appearing to reboot on every "scontrol reconfigure"
>     when slurmd was started with the '-b' flag.
>  -- Fix a slurmctld fatal error when upgrading to 23.11 and changing from
>     select/cons_res to select/cons_tres at the same time.
>  -- slurmctld - Fix subsequent reconfigure hanging after a failed reconfigure.
>  -- slurmctld - Reject arbitrary distribution jobs that have a minimum node
>     count that differs from the number of unique nodes in the hostlist.
>  -- Prevent slurmdbd errors when updating reservations with names containing
>     apostrophes.
>  -- Prevent message extension attacks that could bypass the message hash.
>     CVE-2023-49933.
>  -- Prevent SQL injection attacks in slurmdbd. CVE-2023-49934.
>  -- Prevent message hash bypass in slurmd which can allow an attacker to reuse
>     root-level MUNGE tokens and escalate permissions. CVE-2023-49935.
>  -- Prevent NULL pointer dereference on size_valp overflow. CVE-2023-49936.
>  -- Prevent double-xfree() on error in _unpack_node_reg_resp().
>     CVE-2023-49937.

> * Changes in Slurm 23.02.7
> ==========================
>  -- libslurm_nss - Avoid causing glibc to assert due to an unexpected return
>     from slurm_nss due to an error during lookup.
>  -- Fix job requests with --tres-per-task sometimes resulting in bad allocations
>     that cannot run subsequent job steps.
>  -- Fix issue with slurmd where srun fails to be warned when a node prolog
>     script runs beyond MsgTimeout set in slurm.conf.
>  -- gres/shard - Fix plugin functions to have matching parameter orders.
>  -- gpu/nvml - Fix issue that resulted in the wrong MIG devices being
>     constrained to a job
>  -- gpu/nvml - Fix linking issue with MIGs that prevented multiple MIGs being
>     used in a single job for certain MIG configurations
>  -- Add JobAcctGatherParams=DisableGPUAcct to disable gpu accounting.
>  -- Fix file descriptor leak in slurmd when using acct_gather_energy/ipmi with
>     DCMI devices.
>  -- sview - avoid crash when job has a node list string > 49 characters.
>  -- Prevent slurmctld crash during reconfigure when packing job start messages.
>  -- Preserve reason uid on reconfig.
>  -- Update node reason with updated INVAL state reason if different from last
>     registration.
>  -- acct_gather_energy/ipmi - Improve logging of DCMI issues.
>  -- conmgr - Avoid NULL dereference when using auth/none.
>  -- data_parser/v0.0.39 - Fixed how deleted QOS and associations for jobs are
>     dumped.
>  -- burst_buffer/lua - fix stage in counter not decrementing when a job is
>     cancelled during stage in. This counter is used to enforce the limit of 128
>     scripts per stage.
>  -- gpu/oneapi - Add support for new env vars ZE_FLAT_DEVICE_HIERARCHY and
>     ZE_ENABLE_PCI_ID_DEVICE_ORDER.
>  -- data_parser/v0.0.39 - Fix how the "INVALID" nodes state is dumped.
>  -- data_parser/v0.0.39 - Fix parsing of flag arrays to allow muliple flags to
>     be set.
>  -- Avoid leaking sockets when an x11 application is closed in an allocation.
>  -- Fix missing mutex unlock in group cache code which could cause slurmctld to
>     freeze.
>  -- Fix scrontab monthly jobs possibly skipping a month if added near the end of
>     the month.
>  -- Fix loading of the gpu account gather energy plugin.
>  -- Fix slurmctld segfault when reconfiguring after a job resize.
>  -- Fix crash in slurmstepd that can occur when launching tasks via mpi using
>     the pmi2 plugin and using the route/topology plugin.
>  -- data_parser/v0.0.39 - skip empty string when parsing QOS ids.
>  -- Fix "qos <id> doesn't exist" error message in assoc_mgr_update_assocs to
>     print the attempted new default qos, rather than the current default qos.
>  -- Remove error message from assoc_mgr_update_assocs when purposefully
>     resetting the default qos.
>  -- data_parser/v0.0.39 - Fix segfault when POSTing data with association usage.
>  -- Prevent message extension attacks that could bypass the message hash.
>     CVE-2023-49933.
>  -- Prevent message hash bypass in slurmd which can allow an attacker to reuse
>     root-level MUNGE tokens and escalate permissions. CVE-2023-49935.
>  -- Prevent NULL pointer dereference on size_valp overflow. CVE-2023-49936.
>  -- Prevent double-xfree() on error in _unpack_node_reg_resp(). CVE-2023-49937.
>  -- Prevent modified sbcast RPCs from opening a file with the wrong group
>     permissions. CVE-2023-49938.

> * Changes in Slurm 22.05.11
> ===========================
>  -- Prevent message extension attacks that could bypass the message hash.
>     CVE-2023-49933.
>  -- Prevent NULL pointer dereference on size_valp overflow. CVE-2023-49936.
>  -- Prevent double-xfree() on error in _unpack_node_reg_resp().
>     CVE-2023-49937.
>  -- Prevent modified sbcast RPCs from opening a file with the wrong group
>     permissions. CVE-2023-49938.



More information about the slurm-users mailing list