On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users slurm-users@lists.schedmd.com wrote:
Hi Zhao,
my guess is that in your faster case you are using hyperthreading whereas in the Slurm case you don't.
Can you check what performance you get when you add
#SBATCH --hint=multithread
to you slurm script?
I tried to add the above instructions to the slurm script, and only found that the job will stuck there forever. Here are the results 10 minutes after the job was submitted:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ cat sub.sh.o6 ####################################################### date = 2024年 05月 25日 星期六 07:31:31 CST hostname = x13dai-t pwd = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV sbatch = /usr/bin/sbatch
WORK_DIR = SLURM_SUBMIT_DIR = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV SLURM_JOB_NUM_NODES = 1 SLURM_NTASKS = 36 SLURM_NTASKS_PER_NODE = SLURM_CPUS_PER_TASK = SLURM_JOBID = 6 SLURM_JOB_NODELIST = localhost SLURM_NNODES = 1 SLURMTMPDIR = #######################################################
running 36 mpi-ranks, on 1 nodes distrk: each k-point on 36 cores, 1 groups distr: one band on 4 cores, 9 groups vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex
POSCAR found type information on POSCAR Cr POSCAR found : 1 types and 72 ions Reading from existing POTCAR scaLAPACK will be used Reading from existing POTCAR ----------------------------------------------------------------------------- | | | ----> ADVICE to this user running VASP <---- | | | | You have a (more or less) 'large supercell' and for larger cells it | | might be more efficient to use real-space projection operators. | | Therefore, try LREAL= Auto in the INCAR file. | | Mind: For very accurate calculation, you might also keep the | | reciprocal projection scheme (i.e. LREAL=.FALSE.). | | | -----------------------------------------------------------------------------
LDA part: xc-table for (Slater+PW92), standard interpolation POSCAR, INCAR and KPOINTS ok, starting setup FFT: planning ... GRIDC FFT: planning ... GRID_SOFT FFT: planning ... GRID WAVECAR not read
Another difference between the two might be a) the communication channel/interface that is used.
I tried to use `mpirun', `mpiexec', and `srun --mpi pmi2', and they all have similar behaviors as described above.
b) the number of nodes involved: when using mpirun you might run things on more than one node.
This is a single-node cluster with two sockets.
Regards, Hermann
Regards, Zhao
On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:
Dear Slurm Users,
I am experiencing a significant performance discrepancy when running the same VASP job through the Slurm scheduler compared to running it directly with mpirun. I am hoping for some insights or advice on how to resolve this issue.
System Information:
Slurm Version: 21.08.5 OS: Ubuntu 22.04.4 LTS (Jammy)
Job Submission Script:
#!/usr/bin/env bash #SBATCH -N 1 #SBATCH -D . #SBATCH --output=%j.out #SBATCH --error=%j.err ##SBATCH --time=2-00:00:00 #SBATCH --ntasks=36 #SBATCH --mem=64G
echo '#######################################################' echo "date = $(date)" echo "hostname = $(hostname -s)" echo "pwd = $(pwd)" echo "sbatch = $(which sbatch | xargs realpath -e)" echo "" echo "WORK_DIR = $WORK_DIR" echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" echo "SLURM_NTASKS = $SLURM_NTASKS" echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" echo "SLURM_NNODES = $SLURM_NNODES" echo "SLURMTMPDIR = $SLURMTMPDIR" echo '#######################################################' echo ""
module purge > /dev/null 2>&1 module load vasp ulimit -s unlimited mpirun vasp_std
Performance Observation:
When running the job through Slurm:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 14.4893: real time 14.5049 LOOP: cpu time 14.3538: real time 14.3621 LOOP: cpu time 14.3870: real time 14.3568 LOOP: cpu time 15.9722: real time 15.9018 LOOP: cpu time 16.4527: real time 16.4370 LOOP: cpu time 16.7918: real time 16.7781 LOOP: cpu time 16.9797: real time 16.9961 LOOP: cpu time 15.9762: real time 16.0124 LOOP: cpu time 16.8835: real time 16.9008 LOOP: cpu time 15.2828: real time 15.2921 LOOP+: cpu time 176.0917: real time 176.0755
When running the job directly with mpirun:
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ mpirun -n 36 vasp_std werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ grep LOOP OUTCAR LOOP: cpu time 9.0072: real time 9.0074 LOOP: cpu time 9.0515: real time 9.0524 LOOP: cpu time 9.1896: real time 9.1907 LOOP: cpu time 10.1467: real time 10.1479 LOOP: cpu time 10.2691: real time 10.2705 LOOP: cpu time 10.4330: real time 10.4340 LOOP: cpu time 10.9049: real time 10.9055 LOOP: cpu time 9.9718: real time 9.9714 LOOP: cpu time 10.4511: real time 10.4470 LOOP: cpu time 9.4621: real time 9.4584 LOOP+: cpu time 110.0790: real time 110.0739
Could you provide any insights or suggestions on what might be causing this performance issue? Are there any specific configurations or settings in Slurm that I should check or adjust to align the performance more closely with the direct mpirun execution?
Thank you for your time and assistance.
Best regards, Zhao
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com