[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

24 May 2024


      Hi Zhao,
my guess is that in your faster case you are using hyperthreading 
whereas in the Slurm case you don't.
Can you check what performance you get when you add
#SBATCH --hint=multithread
to you slurm script?
Another difference between the two might be
a) the communication channel/interface that is used.
b) the number of nodes involved: when using mpirun you might run things 
on more than one node.
Regards,
Hermann
On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:
...
Dear Slurm Users,
I am experiencing a significant performance discrepancy when running
the same VASP job through the Slurm scheduler compared to running it
directly with mpirun. I am hoping for some insights or advice on how
to resolve this issue.
System Information:
Slurm Version: 21.08.5
OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash
#SBATCH -N 1
#SBATCH -D .
#SBATCH --output=%j.out
#SBATCH --error=%j.err
##SBATCH --time=2-00:00:00
#SBATCH --ntasks=36
#SBATCH --mem=64G
echo '#######################################################'
echo "date                    = $(date)"
echo "hostname                = $(hostname -s)"
echo "pwd                     = $(pwd)"
echo "sbatch                  = $(which sbatch | xargs realpath -e)"
echo ""
echo "WORK_DIR                = $WORK_DIR"
echo "SLURM_SUBMIT_DIR        = $SLURM_SUBMIT_DIR"
echo "SLURM_JOB_NUM_NODES     = $SLURM_JOB_NUM_NODES"
echo "SLURM_NTASKS            = $SLURM_NTASKS"
echo "SLURM_NTASKS_PER_NODE   = $SLURM_NTASKS_PER_NODE"
echo "SLURM_CPUS_PER_TASK     = $SLURM_CPUS_PER_TASK"
echo "SLURM_JOBID             = $SLURM_JOBID"
echo "SLURM_JOB_NODELIST      = $SLURM_JOB_NODELIST"
echo "SLURM_NNODES            = $SLURM_NNODES"
echo "SLURMTMPDIR             = $SLURMTMPDIR"
echo '#######################################################'
echo ""
module purge > /dev/null 2>&1
module load vasp
ulimit -s unlimited
mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
grep LOOP OUTCAR
       LOOP:  cpu time     14.4893: real time     14.5049
       LOOP:  cpu time     14.3538: real time     14.3621
       LOOP:  cpu time     14.3870: real time     14.3568
       LOOP:  cpu time     15.9722: real time     15.9018
       LOOP:  cpu time     16.4527: real time     16.4370
       LOOP:  cpu time     16.7918: real time     16.7781
       LOOP:  cpu time     16.9797: real time     16.9961
       LOOP:  cpu time     15.9762: real time     16.0124
       LOOP:  cpu time     16.8835: real time     16.9008
       LOOP:  cpu time     15.2828: real time     15.2921
      LOOP+:  cpu time    176.0917: real time    176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
mpirun -n 36 vasp_std
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
grep LOOP OUTCAR
       LOOP:  cpu time      9.0072: real time      9.0074
       LOOP:  cpu time      9.0515: real time      9.0524
       LOOP:  cpu time      9.1896: real time      9.1907
       LOOP:  cpu time     10.1467: real time     10.1479
       LOOP:  cpu time     10.2691: real time     10.2705
       LOOP:  cpu time     10.4330: real time     10.4340
       LOOP:  cpu time     10.9049: real time     10.9055
       LOOP:  cpu time      9.9718: real time      9.9714
       LOOP:  cpu time     10.4511: real time     10.4470
       LOOP:  cpu time      9.4621: real time      9.4584
      LOOP+:  cpu time    110.0790: real time    110.0739
Could you provide any insights or suggestions on what might be causing
this performance issue? Are there any specific configurations or
settings in Slurm that I should check or adjust to align the
performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
Best regards,
Zhao

2025

2024

[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.