Dear Slurm Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System Information:
Slurm Version: 21.08.5 OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash #SBATCH -N 1 #SBATCH -D . #SBATCH --output=%j.out #SBATCH --error=%j.err ##SBATCH --time=2-00:00:00 #SBATCH --ntasks=36 #SBATCH --mem=64G
echo '#######################################################' echo "date = $(date)" echo "hostname = $(hostname -s)" echo "pwd = $(pwd)" echo "sbatch = $(which sbatch | xargs realpath -e)" echo "" echo "WORK_DIR = $WORK_DIR" echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" echo "SLURM_NTASKS = $SLURM_NTASKS" echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" echo "SLURM_NNODES = $SLURM_NNODES" echo "SLURMTMPDIR = $SLURMTMPDIR" echo '#######################################################' echo ""
module purge > /dev/null 2>&1 module load vasp ulimit -s unlimited mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 14.4893: real time 14.5049 LOOP: cpu time 14.3538: real time 14.3621 LOOP: cpu time 14.3870: real time 14.3568 LOOP: cpu time 15.9722: real time 15.9018 LOOP: cpu time 16.4527: real time 16.4370 LOOP: cpu time 16.7918: real time 16.7781 LOOP: cpu time 16.9797: real time 16.9961 LOOP: cpu time 15.9762: real time 16.0124 LOOP: cpu time 16.8835: real time 16.9008 LOOP: cpu time 15.2828: real time 15.2921 LOOP+: cpu time 176.0917: real time 176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 9.0072: real time 9.0074 LOOP: cpu time 9.0515: real time 9.0524 LOOP: cpu time 9.1896: real time 9.1907 LOOP: cpu time 10.1467: real time 10.1479 LOOP: cpu time 10.2691: real time 10.2705 LOOP: cpu time 10.4330: real time 10.4340 LOOP: cpu time 10.9049: real time 10.9055 LOOP: cpu time 9.9718: real time 9.9714 LOOP: cpu time 10.4511: real time 10.4470 LOOP: cpu time 9.4621: real time 9.4584 LOOP+: cpu time 110.0790: real time 110.0739
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
Best regards, Zhao
On Fri, May 24, 2024 at 9:32 PM Hongyi Zhao hongyi.zhao@gmail.com wrote:
Dear Slurm Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System Information:
Slurm Version: 21.08.5 OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash #SBATCH -N 1 #SBATCH -D . #SBATCH --output=%j.out #SBATCH --error=%j.err ##SBATCH --time=2-00:00:00 #SBATCH --ntasks=36 #SBATCH --mem=64G
echo '#######################################################' echo "date = $(date)" echo "hostname = $(hostname -s)" echo "pwd = $(pwd)" echo "sbatch = $(which sbatch | xargs realpath -e)" echo "" echo "WORK_DIR = $WORK_DIR" echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" echo "SLURM_NTASKS = $SLURM_NTASKS" echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" echo "SLURM_NNODES = $SLURM_NNODES" echo "SLURMTMPDIR = $SLURMTMPDIR" echo '#######################################################' echo ""
module purge > /dev/null 2>&1 module load vasp ulimit -s unlimited mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 14.4893: real time 14.5049 LOOP: cpu time 14.3538: real time 14.3621 LOOP: cpu time 14.3870: real time 14.3568 LOOP: cpu time 15.9722: real time 15.9018 LOOP: cpu time 16.4527: real time 16.4370 LOOP: cpu time 16.7918: real time 16.7781 LOOP: cpu time 16.9797: real time 16.9961 LOOP: cpu time 15.9762: real time 16.0124 LOOP: cpu time 16.8835: real time 16.9008 LOOP: cpu time 15.2828: real time 15.2921 LOOP+: cpu time 176.0917: real time 176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 9.0072: real time 9.0074 LOOP: cpu time 9.0515: real time 9.0524 LOOP: cpu time 9.1896: real time 9.1907 LOOP: cpu time 10.1467: real time 10.1479 LOOP: cpu time 10.2691: real time 10.2705 LOOP: cpu time 10.4330: real time 10.4340 LOOP: cpu time 10.9049: real time 10.9055 LOOP: cpu time 9.9718: real time 9.9714 LOOP: cpu time 10.4511: real time 10.4470 LOOP: cpu time 9.4621: real time 9.4584 LOOP+: cpu time 110.0790: real time 110.0739
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
The attachment is the test example used above.
Best regards, Zhao -- Assoc. Prof. Hongsheng Zhao hongyi.zhao@gmail.com Theory and Simulation of Materials Hebei Vocational University of Technology and Engineering No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
Hi Zhao,
my guess is that in your faster case you are using hyperthreading whereas in the Slurm case you don't.
Can you check what performance you get when you add
#SBATCH --hint=multithread
to you slurm script?
Another difference between the two might be a) the communication channel/interface that is used. b) the number of nodes involved: when using mpirun you might run things on more than one node.
Regards, Hermann
On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:
Dear Slurm Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System Information:
Slurm Version: 21.08.5 OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash #SBATCH -N 1 #SBATCH -D . #SBATCH --output=%j.out #SBATCH --error=%j.err ##SBATCH --time=2-00:00:00 #SBATCH --ntasks=36 #SBATCH --mem=64G
echo '#######################################################' echo "date = $(date)" echo "hostname = $(hostname -s)" echo "pwd = $(pwd)" echo "sbatch = $(which sbatch | xargs realpath -e)" echo "" echo "WORK_DIR = $WORK_DIR" echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" echo "SLURM_NTASKS = $SLURM_NTASKS" echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" echo "SLURM_NNODES = $SLURM_NNODES" echo "SLURMTMPDIR = $SLURMTMPDIR" echo '#######################################################' echo ""
module purge > /dev/null 2>&1 module load vasp ulimit -s unlimited mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 14.4893: real time 14.5049 LOOP: cpu time 14.3538: real time 14.3621 LOOP: cpu time 14.3870: real time 14.3568 LOOP: cpu time 15.9722: real time 15.9018 LOOP: cpu time 16.4527: real time 16.4370 LOOP: cpu time 16.7918: real time 16.7781 LOOP: cpu time 16.9797: real time 16.9961 LOOP: cpu time 15.9762: real time 16.0124 LOOP: cpu time 16.8835: real time 16.9008 LOOP: cpu time 15.2828: real time 15.2921 LOOP+: cpu time 176.0917: real time 176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 9.0072: real time 9.0074 LOOP: cpu time 9.0515: real time 9.0524 LOOP: cpu time 9.1896: real time 9.1907 LOOP: cpu time 10.1467: real time 10.1479 LOOP: cpu time 10.2691: real time 10.2705 LOOP: cpu time 10.4330: real time 10.4340 LOOP: cpu time 10.9049: real time 10.9055 LOOP: cpu time 9.9718: real time 9.9714 LOOP: cpu time 10.4511: real time 10.4470 LOOP: cpu time 9.4621: real time 9.4584 LOOP+: cpu time 110.0790: real time 110.0739
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
Best regards, Zhao
On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users slurm-users@lists.schedmd.com wrote:
Hi Zhao,
my guess is that in your faster case you are using hyperthreading whereas in the Slurm case you don't.
Can you check what performance you get when you add
#SBATCH --hint=multithread
to you slurm script?
I tried to add the above instructions to the slurm script, and only found that the job will stuck there forever. Here are the results 10 minutes after the job was submitted:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ cat sub.sh.o6 ####################################################### date = 2024年 05月 25日 星期六 07:31:31 CST hostname = x13dai-t pwd = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV sbatch = /usr/bin/sbatch
WORK_DIR = SLURM_SUBMIT_DIR = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV SLURM_JOB_NUM_NODES = 1 SLURM_NTASKS = 36 SLURM_NTASKS_PER_NODE = SLURM_CPUS_PER_TASK = SLURM_JOBID = 6 SLURM_JOB_NODELIST = localhost SLURM_NNODES = 1 SLURMTMPDIR = #######################################################
running 36 mpi-ranks, on 1 nodes distrk: each k-point on 36 cores, 1 groups distr: one band on 4 cores, 9 groups vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex
POSCAR found type information on POSCAR Cr POSCAR found : 1 types and 72 ions Reading from existing POTCAR scaLAPACK will be used Reading from existing POTCAR ----------------------------------------------------------------------------- | | | ----> ADVICE to this user running VASP <---- | | | | You have a (more or less) 'large supercell' and for larger cells it | | might be more efficient to use real-space projection operators. | | Therefore, try LREAL= Auto in the INCAR file. | | Mind: For very accurate calculation, you might also keep the | | reciprocal projection scheme (i.e. LREAL=.FALSE.). | | | -----------------------------------------------------------------------------
LDA part: xc-table for (Slater+PW92), standard interpolation POSCAR, INCAR and KPOINTS ok, starting setup FFT: planning ... GRIDC FFT: planning ... GRID_SOFT FFT: planning ... GRID WAVECAR not read
Another difference between the two might be a) the communication channel/interface that is used.
I tried to use `mpirun', `mpiexec', and `srun --mpi pmi2', and they all have similar behaviors as described above.
b) the number of nodes involved: when using mpirun you might run things on more than one node.
This is a single-node cluster with two sockets.
Regards, Hermann
Regards, Zhao
On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:
Dear Slurm Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System Information:
Slurm Version: 21.08.5 OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash #SBATCH -N 1 #SBATCH -D . #SBATCH --output=%j.out #SBATCH --error=%j.err ##SBATCH --time=2-00:00:00 #SBATCH --ntasks=36 #SBATCH --mem=64G
echo '#######################################################' echo "date = $(date)" echo "hostname = $(hostname -s)" echo "pwd = $(pwd)" echo "sbatch = $(which sbatch | xargs realpath -e)" echo "" echo "WORK_DIR = $WORK_DIR" echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" echo "SLURM_NTASKS = $SLURM_NTASKS" echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" echo "SLURM_NNODES = $SLURM_NNODES" echo "SLURMTMPDIR = $SLURMTMPDIR" echo '#######################################################' echo ""
module purge > /dev/null 2>&1 module load vasp ulimit -s unlimited mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 14.4893: real time 14.5049 LOOP: cpu time 14.3538: real time 14.3621 LOOP: cpu time 14.3870: real time 14.3568 LOOP: cpu time 15.9722: real time 15.9018 LOOP: cpu time 16.4527: real time 16.4370 LOOP: cpu time 16.7918: real time 16.7781 LOOP: cpu time 16.9797: real time 16.9961 LOOP: cpu time 15.9762: real time 16.0124 LOOP: cpu time 16.8835: real time 16.9008 LOOP: cpu time 15.2828: real time 15.2921 LOOP+: cpu time 176.0917: real time 176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 9.0072: real time 9.0074 LOOP: cpu time 9.0515: real time 9.0524 LOOP: cpu time 9.1896: real time 9.1907 LOOP: cpu time 10.1467: real time 10.1479 LOOP: cpu time 10.2691: real time 10.2705 LOOP: cpu time 10.4330: real time 10.4340 LOOP: cpu time 10.9049: real time 10.9055 LOOP: cpu time 9.9718: real time 9.9714 LOOP: cpu time 10.4511: real time 10.4470 LOOP: cpu time 9.4621: real time 9.4584 LOOP+: cpu time 110.0790: real time 110.0739
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
Best regards, Zhao
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On Sat, May 25, 2024 at 7:50 AM Hongyi Zhao hongyi.zhao@gmail.com wrote:
On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users slurm-users@lists.schedmd.com wrote:
Hi Zhao,
my guess is that in your faster case you are using hyperthreading whereas in the Slurm case you don't.
Can you check what performance you get when you add
#SBATCH --hint=multithread
to you slurm script?
I tried to add the above instructions to the slurm script, and only found that the job will stuck there forever. Here are the results 10 minutes after the job was submitted:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ cat sub.sh.o6 ####################################################### date = 2024年 05月 25日 星期六 07:31:31 CST hostname = x13dai-t pwd = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV sbatch = /usr/bin/sbatch
WORK_DIR = SLURM_SUBMIT_DIR = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV SLURM_JOB_NUM_NODES = 1 SLURM_NTASKS = 36 SLURM_NTASKS_PER_NODE = SLURM_CPUS_PER_TASK = SLURM_JOBID = 6 SLURM_JOB_NODELIST = localhost SLURM_NNODES = 1 SLURMTMPDIR = #######################################################
running 36 mpi-ranks, on 1 nodes distrk: each k-point on 36 cores, 1 groups distr: one band on 4 cores, 9 groups vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex
POSCAR found type information on POSCAR Cr POSCAR found : 1 types and 72 ions Reading from existing POTCAR scaLAPACK will be used Reading from existing POTCAR
| | | ----> ADVICE to this user running VASP <---- | | | | You have a (more or less) 'large supercell' and for larger cells it | | might be more efficient to use real-space projection operators. | | Therefore, try LREAL= Auto in the INCAR file. | | Mind: For very accurate calculation, you might also keep the | | reciprocal projection scheme (i.e. LREAL=.FALSE.). | | |
LDA part: xc-table for (Slater+PW92), standard interpolation POSCAR, INCAR and KPOINTS ok, starting setup FFT: planning ... GRIDC FFT: planning ... GRID_SOFT FFT: planning ... GRID WAVECAR not read
Ultimately, I found that the cause of the problem was that hyper-threading was enabled by default in the BIOS. If I disable hyper-threading, I observed that the computational efficiency is consistent between using slurm and using mpirun directly. Therefore, it appears that hyper-threading should not be enabled in the BIOS when using slurm.
Another difference between the two might be a) the communication channel/interface that is used.
I tried to use `mpirun', `mpiexec', and `srun --mpi pmi2', and they all have similar behaviors as described above.
b) the number of nodes involved: when using mpirun you might run things on more than one node.
This is a single-node cluster with two sockets.
Regards, Hermann
Regards, Zhao
On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:
Dear Slurm Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System Information:
Slurm Version: 21.08.5 OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash #SBATCH -N 1 #SBATCH -D . #SBATCH --output=%j.out #SBATCH --error=%j.err ##SBATCH --time=2-00:00:00 #SBATCH --ntasks=36 #SBATCH --mem=64G
echo '#######################################################' echo "date = $(date)" echo "hostname = $(hostname -s)" echo "pwd = $(pwd)" echo "sbatch = $(which sbatch | xargs realpath -e)" echo "" echo "WORK_DIR = $WORK_DIR" echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" echo "SLURM_NTASKS = $SLURM_NTASKS" echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" echo "SLURM_NNODES = $SLURM_NNODES" echo "SLURMTMPDIR = $SLURMTMPDIR" echo '#######################################################' echo ""
module purge > /dev/null 2>&1 module load vasp ulimit -s unlimited mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 14.4893: real time 14.5049 LOOP: cpu time 14.3538: real time 14.3621 LOOP: cpu time 14.3870: real time 14.3568 LOOP: cpu time 15.9722: real time 15.9018 LOOP: cpu time 16.4527: real time 16.4370 LOOP: cpu time 16.7918: real time 16.7781 LOOP: cpu time 16.9797: real time 16.9961 LOOP: cpu time 15.9762: real time 16.0124 LOOP: cpu time 16.8835: real time 16.9008 LOOP: cpu time 15.2828: real time 15.2921 LOOP+: cpu time 176.0917: real time 176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 9.0072: real time 9.0074 LOOP: cpu time 9.0515: real time 9.0524 LOOP: cpu time 9.1896: real time 9.1907 LOOP: cpu time 10.1467: real time 10.1479 LOOP: cpu time 10.2691: real time 10.2705 LOOP: cpu time 10.4330: real time 10.4340 LOOP: cpu time 10.9049: real time 10.9055 LOOP: cpu time 9.9718: real time 9.9714 LOOP: cpu time 10.4511: real time 10.4470 LOOP: cpu time 9.4621: real time 9.4584 LOOP+: cpu time 110.0790: real time 110.0739
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
Best regards, Zhao
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On Sat, May 25, 2024 at 9:49 AM Hongyi Zhao hongyi.zhao@gmail.com wrote:
On Sat, May 25, 2024 at 7:50 AM Hongyi Zhao hongyi.zhao@gmail.com wrote:
On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users slurm-users@lists.schedmd.com wrote:
Hi Zhao,
my guess is that in your faster case you are using hyperthreading whereas in the Slurm case you don't.
Can you check what performance you get when you add
#SBATCH --hint=multithread
to you slurm script?
I tried to add the above instructions to the slurm script, and only found that the job will stuck there forever. Here are the results 10 minutes after the job was submitted:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ cat sub.sh.o6 ####################################################### date = 2024年 05月 25日 星期六 07:31:31 CST hostname = x13dai-t pwd = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV sbatch = /usr/bin/sbatch
WORK_DIR = SLURM_SUBMIT_DIR = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV SLURM_JOB_NUM_NODES = 1 SLURM_NTASKS = 36 SLURM_NTASKS_PER_NODE = SLURM_CPUS_PER_TASK = SLURM_JOBID = 6 SLURM_JOB_NODELIST = localhost SLURM_NNODES = 1 SLURMTMPDIR = #######################################################
running 36 mpi-ranks, on 1 nodes distrk: each k-point on 36 cores, 1 groups distr: one band on 4 cores, 9 groups vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex
POSCAR found type information on POSCAR Cr POSCAR found : 1 types and 72 ions Reading from existing POTCAR scaLAPACK will be used Reading from existing POTCAR
| | | ----> ADVICE to this user running VASP <---- | | | | You have a (more or less) 'large supercell' and for larger cells it | | might be more efficient to use real-space projection operators. | | Therefore, try LREAL= Auto in the INCAR file. | | Mind: For very accurate calculation, you might also keep the | | reciprocal projection scheme (i.e. LREAL=.FALSE.). | | |
LDA part: xc-table for (Slater+PW92), standard interpolation POSCAR, INCAR and KPOINTS ok, starting setup FFT: planning ... GRIDC FFT: planning ... GRID_SOFT FFT: planning ... GRID WAVECAR not read
Ultimately, I found that the cause of the problem was that hyper-threading was enabled by default in the BIOS. If I disable hyper-threading, I observed that the computational efficiency is consistent between using slurm and using mpirun directly. Therefore, it appears that hyper-threading should not be enabled in the BIOS when using slurm.
Regarding the reason, I think the description here [1] is reasonable:
It is recommended to disable processor hyper-threading. In applications that are compute-intensive rather than I/O-intensive, enabling HyperThreading is likely to decrease the overall performance of the server. Intuitively, the physical memory available per core is reduced after hyper-threading is enabled.
[1] https://gist.github.com/weijianwen/acee3cd49825da8c8dfb4a99365b54c8#%E5%85%B...
Regards, Zhao
Another difference between the two might be a) the communication channel/interface that is used.
I tried to use `mpirun', `mpiexec', and `srun --mpi pmi2', and they all have similar behaviors as described above.
b) the number of nodes involved: when using mpirun you might run things on more than one node.
This is a single-node cluster with two sockets.
Regards, Hermann
Regards, Zhao
On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:
Dear Slurm Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System Information:
Slurm Version: 21.08.5 OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash #SBATCH -N 1 #SBATCH -D . #SBATCH --output=%j.out #SBATCH --error=%j.err ##SBATCH --time=2-00:00:00 #SBATCH --ntasks=36 #SBATCH --mem=64G
echo '#######################################################' echo "date = $(date)" echo "hostname = $(hostname -s)" echo "pwd = $(pwd)" echo "sbatch = $(which sbatch | xargs realpath -e)" echo "" echo "WORK_DIR = $WORK_DIR" echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" echo "SLURM_NTASKS = $SLURM_NTASKS" echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" echo "SLURM_NNODES = $SLURM_NNODES" echo "SLURMTMPDIR = $SLURMTMPDIR" echo '#######################################################' echo ""
module purge > /dev/null 2>&1 module load vasp ulimit -s unlimited mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 14.4893: real time 14.5049 LOOP: cpu time 14.3538: real time 14.3621 LOOP: cpu time 14.3870: real time 14.3568 LOOP: cpu time 15.9722: real time 15.9018 LOOP: cpu time 16.4527: real time 16.4370 LOOP: cpu time 16.7918: real time 16.7781 LOOP: cpu time 16.9797: real time 16.9961 LOOP: cpu time 15.9762: real time 16.0124 LOOP: cpu time 16.8835: real time 16.9008 LOOP: cpu time 15.2828: real time 15.2921 LOOP+: cpu time 176.0917: real time 176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 9.0072: real time 9.0074 LOOP: cpu time 9.0515: real time 9.0524 LOOP: cpu time 9.1896: real time 9.1907 LOOP: cpu time 10.1467: real time 10.1479 LOOP: cpu time 10.2691: real time 10.2705 LOOP: cpu time 10.4330: real time 10.4340 LOOP: cpu time 10.9049: real time 10.9055 LOOP: cpu time 9.9718: real time 9.9714 LOOP: cpu time 10.4511: real time 10.4470 LOOP: cpu time 9.4621: real time 9.4584 LOOP+: cpu time 110.0790: real time 110.0739
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
Best regards, Zhao
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On Sat, May 25, 2024 at 10:06 AM Hongyi Zhao hongyi.zhao@gmail.com wrote:
On Sat, May 25, 2024 at 9:49 AM Hongyi Zhao hongyi.zhao@gmail.com wrote:
On Sat, May 25, 2024 at 7:50 AM Hongyi Zhao hongyi.zhao@gmail.com wrote:
On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users slurm-users@lists.schedmd.com wrote:
Hi Zhao,
my guess is that in your faster case you are using hyperthreading whereas in the Slurm case you don't.
Can you check what performance you get when you add
#SBATCH --hint=multithread
to you slurm script?
I tried to add the above instructions to the slurm script, and only found that the job will stuck there forever. Here are the results 10 minutes after the job was submitted:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ cat sub.sh.o6 ####################################################### date = 2024年 05月 25日 星期六 07:31:31 CST hostname = x13dai-t pwd = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV sbatch = /usr/bin/sbatch
WORK_DIR = SLURM_SUBMIT_DIR = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV SLURM_JOB_NUM_NODES = 1 SLURM_NTASKS = 36 SLURM_NTASKS_PER_NODE = SLURM_CPUS_PER_TASK = SLURM_JOBID = 6 SLURM_JOB_NODELIST = localhost SLURM_NNODES = 1 SLURMTMPDIR = #######################################################
running 36 mpi-ranks, on 1 nodes distrk: each k-point on 36 cores, 1 groups distr: one band on 4 cores, 9 groups vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex
POSCAR found type information on POSCAR Cr POSCAR found : 1 types and 72 ions Reading from existing POTCAR scaLAPACK will be used Reading from existing POTCAR
| | | ----> ADVICE to this user running VASP <---- | | | | You have a (more or less) 'large supercell' and for larger cells it | | might be more efficient to use real-space projection operators. | | Therefore, try LREAL= Auto in the INCAR file. | | Mind: For very accurate calculation, you might also keep the | | reciprocal projection scheme (i.e. LREAL=.FALSE.). | | |
LDA part: xc-table for (Slater+PW92), standard interpolation POSCAR, INCAR and KPOINTS ok, starting setup FFT: planning ... GRIDC FFT: planning ... GRID_SOFT FFT: planning ... GRID WAVECAR not read
Ultimately, I found that the cause of the problem was that hyper-threading was enabled by default in the BIOS. If I disable hyper-threading, I observed that the computational efficiency is consistent between using slurm and using mpirun directly. Therefore, it appears that hyper-threading should not be enabled in the BIOS when using slurm.
Regarding the reason, I think the description here [1] is reasonable:
It is recommended to disable processor hyper-threading. In applications that are compute-intensive rather than I/O-intensive, enabling HyperThreading is likely to decrease the overall performance of the server. Intuitively, the physical memory available per core is reduced after hyper-threading is enabled.
[1] https://gist.github.com/weijianwen/acee3cd49825da8c8dfb4a99365b54c8#%E5%85%B...
See here [1] for the related discussion.
[1] https://www.vasp.at/forum/viewtopic.php?t=19557
Regards, Zhao
On 25-05-2024 03:49, Hongyi Zhao via slurm-users wrote:
Ultimately, I found that the cause of the problem was that hyper-threading was enabled by default in the BIOS. If I disable hyper-threading, I observed that the computational efficiency is consistent between using slurm and using mpirun directly. Therefore, it appears that hyper-threading should not be enabled in the BIOS when using slurm.
Whether or not to enable Hyper-Threading (HT) on your compute nodes depends entirely on the properties of applications that you wish to run on the nodes. Some applications are faster without HT, others are faster with HT. When HT is enabled, the "virtual CPU cores" obviously will have only half the memory available per core.
The VASP code is highly CPU- and memory intensive, and HT should probably be disabled for optimal performance with VASP.
Slurm doesn't affect the performance of your codes with or without HT. Slurm just schedules tasks to run on the available cores.
/Ole
Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com writes:
Whether or not to enable Hyper-Threading (HT) on your compute nodes depends entirely on the properties of applications that you wish to run on the nodes. Some applications are faster without HT, others are faster with HT. When HT is enabled, the "virtual CPU cores" obviously will have only half the memory available per core.
Another consideration is, if you keep HT enabled, do you want Slurm to hand out physical cores to jobs, or logical cpus (hyperthreads)? Again, what is best depends on your workload. On our systems, we tend to either turn off HT, or hand our cores.
On Mon, May 27, 2024 at 2:59 PM Bjørn-Helge Mevik via slurm-users slurm-users@lists.schedmd.com wrote:
Ole Holm Nielsen via slurm-users slurm-users@lists.schedmd.com writes:
Whether or not to enable Hyper-Threading (HT) on your compute nodes depends entirely on the properties of applications that you wish to run on the nodes. Some applications are faster without HT, others are faster with HT. When HT is enabled, the "virtual CPU cores" obviously will have only half the memory available per core.
Another consideration is, if you keep HT enabled, do you want Slurm to hand out physical cores to jobs, or logical cpus (hyperthreads)? Again, what is best depends on your workload. On our systems, we tend to either turn off HT, or hand our cores.
In the case where Hyper-Threading (HT) is enabled, is it possible to configure Slurm to achieve the following effects:
1. If the total number of cores used by jobs is less than the number of physical cores, then hand out physical cores to jobs. 2. When the total number of cores used by jobs exceeds the number of physical cores, use logical CPUs for the excess part.
I heard about the following method to achieve the above purpose, but have not tried it so far:
To configure Slurm for managing jobs more effectively when Hyper-Threading (HT) is enabled, you can implement a strategy that involves distinguishing between physical and logical cores. Here's a possible approach to meet the requirements you described:
1. Configure Slurm to Recognize Physical and Logical Cores
First, ensure that Slurm can differentiate between physical and logical cores. This typically involves setting the CpuBind and TaskPlugin parameters correctly in the Slurm configuration file (usually slurm.conf).
# Settings in slurm.conf TaskPlugin=task/affinity CpuBind=cores
2. Use Gres (Generic Resources) to Identify Physical and Logical Cores
You can utilize the GRES (Generic RESources) feature to define additional resource types, such as physical and logical cores. First, these resources need to be defined in the slurm.conf.
# Define resources in slurm.conf NodeName=NODENAME Gres=cpu_physical:16,cpu_logical:32 CPUs=32 Boards=1 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2
Here, cpu_physical and cpu_logical are custom resource names, followed by the number of resources. The CPUs field should be set to the total number of physical and logical cores.
3. Write Job Submission Scripts
When submitting a job, users need to request the appropriate type of cores based on their needs. For example, if a job requires more cores than the number of physical cores available, it can request a combination of physical and logical cores.
#!/bin/bash #SBATCH --gres=cpu_physical:8,cpu_logical:4 #SBATCH --ntasks=12 #SBATCH --cpus-per-task=1
# Your job execution command
This script requests 8 physical cores and 4 logical cores, totaling 12 cores.
-- B/H
Regards, Zhao
Hi everbody,
On 5/26/24 08:40, Ole Holm Nielsen via slurm-users wrote: [...]
Whether or not to enable Hyper-Threading (HT) on your compute nodes depends entirely on the properties of applications that you wish to run on the nodes. Some applications are faster without HT, others are faster with HT. When HT is enabled, the "virtual CPU cores" obviously will have only half the memory available per core.
The VASP code is highly CPU- and memory intensive, and HT should probably be disabled for optimal performance with VASP.
Slurm doesn't affect the performance of your codes with or without HT. Slurm just schedules tasks to run on the available cores.
This is how we are handling Hyper-Threading in our cluster: * It's enabled in the BIOS/system settings. * The important parts in our slurm.conf are: TaskPlugin=task/affinity,task/cgroup CliFilterPlugins=cli_filter/lua NodeName=DEFAULT ... ThreadsPerCore=2 * We make "--hint=nomultithread" the default for jobs by having this in cli_filter.lua: function slurm_cli_setup_defaults(options, early_pass) options['hint'] = 'nomultithread' return slurm.SUCCESS end So users can still use Hyper-Threading by specifying "--hint=multithread" in their job-script which will give them two "CPUs/Threads" per Core. Without this option they will get one Core per requested CPU.
This works for us and our users. There is only one small side-effect: when a job is pending, the expected number is displayed in the "CPUS" column of the output of "squeue". But when a job is running, twice that number is displayed (as Slurm counts both Hyper-Threads per Core as "CPUs").
Regards, Hermann