On Mon, May 27, 2024 at 2:59 PM Bjørn-Helge Mevik via slurm-users slurm-users@lists.schedmd.com wrote:
Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com writes:
Whether or not to enable Hyper-Threading (HT) on your compute nodes depends entirely on the properties of applications that you wish to run on the nodes. Some applications are faster without HT, others are faster with HT. When HT is enabled, the "virtual CPU cores" obviously will have only half the memory available per core.
Another consideration is, if you keep HT enabled, do you want Slurm to hand out physical cores to jobs, or logical cpus (hyperthreads)? Again, what is best depends on your workload. On our systems, we tend to either turn off HT, or hand our cores.
In the case where Hyper-Threading (HT) is enabled, is it possible to configure Slurm to achieve the following effects:
1. If the total number of cores used by jobs is less than the number of physical cores, then hand out physical cores to jobs. 2. When the total number of cores used by jobs exceeds the number of physical cores, use logical CPUs for the excess part.
I heard about the following method to achieve the above purpose, but have not tried it so far:
To configure Slurm for managing jobs more effectively when Hyper-Threading (HT) is enabled, you can implement a strategy that involves distinguishing between physical and logical cores. Here's a possible approach to meet the requirements you described:
1. Configure Slurm to Recognize Physical and Logical Cores
First, ensure that Slurm can differentiate between physical and logical cores. This typically involves setting the CpuBind and TaskPlugin parameters correctly in the Slurm configuration file (usually slurm.conf).
# Settings in slurm.conf TaskPlugin=task/affinity CpuBind=cores
2. Use Gres (Generic Resources) to Identify Physical and Logical Cores
You can utilize the GRES (Generic RESources) feature to define additional resource types, such as physical and logical cores. First, these resources need to be defined in the slurm.conf.
# Define resources in slurm.conf NodeName=NODENAME Gres=cpu_physical:16,cpu_logical:32 CPUs=32 Boards=1 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2
Here, cpu_physical and cpu_logical are custom resource names, followed by the number of resources. The CPUs field should be set to the total number of physical and logical cores.
3. Write Job Submission Scripts
When submitting a job, users need to request the appropriate type of cores based on their needs. For example, if a job requires more cores than the number of physical cores available, it can request a combination of physical and logical cores.
#!/bin/bash #SBATCH --gres=cpu_physical:8,cpu_logical:4 #SBATCH --ntasks=12 #SBATCH --cpus-per-task=1
# Your job execution command
This script requests 8 physical cores and 4 logical cores, totaling 12 cores.
-- B/H
Regards, Zhao