[slurm-users] Curious performance results

Angelos Ching angelosching at clustertech.com
Fri Feb 26 04:07:56 UTC 2021


I think it's related to the job step launch semantic change introduced at 20.11.0, which has been reverted since 20.11.3, see 
https://www.schedmd.com/news.php For details.

Cheers,
Angelos 
(Sent from mobile, please pardon me for typos and cursoriness.)

> 26/2/2021 9:07、Volker Blum <volker.blum at duke.edu>のメール:
> 
> Hi, 
> 
> I am testing slurm 20.11.2 on a local cluster together with Intel MPI 2018.4.274 .
> 
> 
> 1) On a single node (20 physical cores) and executed manually (no slurm), a particular application runs fine using Intel’s mpirun, execution time for this example: 8.505 s (wall clock).
> 
> (this is a straight MPI application, no complications)
> 
> 
> 2) Using slurm and Intel’s mpirun through a queue / batch script, 
> 
> #SBATCH --ntasks-per-node=20                                                                                                                                                             
>> mpirun -n 20 $bin > file.out
> 
> the same job runs correcty but takes 121.735 s (wall clock!)
> 
> 
> 3) After some considerable searching, a partial fix is 
> 
> #SBATCH --ntasks-per-node=20                                                                                                                                                             
> ...
> export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0
> srun --cpu-bind=cores -n 20 $bin > file.out
> 
> can bring down the execution time to 13.482 s
> 
> 
> 4) After changing
> 
> #SBATCH --ntasks-per-node=20                                                                                                                                                             
> #SBATCH --cpus-per-task=2                                                                                                                                                                
> ...
> export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0
> srun --cpu-bind=cores -n 20 $bin > file.out
> 
> finally, the time is:  8.480 s
> 
> This timing is as it should be, but at the price of pretending that an application is multithreaded when it is, in fact, not multithreaded. 
> 
> ***
> 
> Is it possible to just keep Intel MPI defaults intact when using its mpirun in a slurm batch script?
> 
> Best wishes
> Volker
> 
> Volker Blum
> Associate Professor
> Ab Initio Materials Simulations
> Thomas Lord Department of Mechanical Engineering and Materials Science
> Duke University
> https://aims.pratt.duke.edu
> 
> volker.blum at duke.edu
> Twitter: Aimsduke
> 
> Office: 4308 Chesterfield Building
> 
> 
> 
> 
> 
> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210226/a9495d29/attachment-0001.htm>


More information about the slurm-users mailing list