We are pleased to announce the availability of the Slurm 24.11 release.
To highlight some new features in 24.11:
- New gpu/nvidia plugin. This does not rely on any NVIDIA libraries, and will build by default on all systems. It supports basic GPU detection and management, but cannot currently identify GPU-to-GPU links, or provide usage data as these are not exposed by the kernel driver. - Add autodetected GPUs to the output from "slurmd -C". - Added new QOS-based reports to "sreport". - Revamped network I/O with the "conmgr" thread-pool model. - Added new "hostlist function" syntax for management commands and configuration files. - switch/hpe_slingshot - Added support for hardware collectives setup through the fabric manager. (Requires SlurmctldParameters=enable_stepmgr) - Added SchedulerParameters=bf_allow_magnetic_slot configuration option to allow backfill planning for magnetic reservations. - Added new "scontrol listjobs" and "liststeps" commands to complement "listpids", and provide --json/--yaml output for all three subcommands. - Allow jobs to be submitted against multiple QOSes. - Added new experimental "oracle" backfill scheduling support, which permits jobs to be delayed if the oracle function determines the reduced fragmentation of the network topology is sufficiently advantageous. - Improved responsiveness of the controller when jobs are requeued by replacing the "db_index" identifier with a slurmctld-generated unique identifier. ("SLUID") - New options to job_container/tmpfs to permit site-specific scripts to modify the namespace before user steps are launched, and to ensure all steps are completely captured within that namespace.
The Slurm documentation has also been updated to the 24.11 release. (Older versions can be found in the archive, linked from the main documentation page.)
Slurm can be downloaded from https://www.schedmd.com/download-slurm/ .
- Tim
slurm-announce@lists.schedmd.com