[slurm-users] Curious performance results

Volker Blum volker.blum at duke.edu
Fri Feb 26 13:15:49 UTC 2021


Thank you! I’ll see if this is an option … would be nice.

I’ll see if we can try this.

Best wishes
Volker

> On Feb 25, 2021, at 11:07 PM, Angelos Ching <angelosching at clustertech.com> wrote:
> 
> I think it's related to the job step launch semantic change introduced at 20.11.0, which has been reverted since 20.11.3, see 
> https://www.schedmd.com/news.php For details.
> 
> Cheers,
> Angelos 
> (Sent from mobile, please pardon me for typos and cursoriness.)
> 
>> 26/2/2021 9:07、Volker Blum <volker.blum at duke.edu>のメール:
>> 
>> Hi, 
>> 
>> I am testing slurm 20.11.2 on a local cluster together with Intel MPI 2018.4.274 .
>> 
>> 
>> 1) On a single node (20 physical cores) and executed manually (no slurm), a particular application runs fine using Intel’s mpirun, execution time for this example: 8.505 s (wall clock).
>> 
>> (this is a straight MPI application, no complications)
>> 
>> 
>> 2) Using slurm and Intel’s mpirun through a queue / batch script, 
>> 
>> #SBATCH --ntasks-per-node=20                                                                                                                                                             
>>>> mpirun -n 20 $bin > file.out
>> 
>> the same job runs correcty but takes 121.735 s (wall clock!)
>> 
>> 
>> 3) After some considerable searching, a partial fix is 
>> 
>> #SBATCH --ntasks-per-node=20                                                                                                                                                             
>> ...
>> export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0
>> srun --cpu-bind=cores -n 20 $bin > file.out
>> 
>> can bring down the execution time to 13.482 s
>> 
>> 
>> 4) After changing
>> 
>> #SBATCH --ntasks-per-node=20                                                                                                                                                             
>> #SBATCH --cpus-per-task=2                                                                                                                                                                
>> ...
>> export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0
>> srun --cpu-bind=cores -n 20 $bin > file.out
>> 
>> finally, the time is:  8.480 s
>> 
>> This timing is as it should be, but at the price of pretending that an application is multithreaded when it is, in fact, not multithreaded. 
>> 
>> ***
>> 
>> Is it possible to just keep Intel MPI defaults intact when using its mpirun in a slurm batch script?
>> 
>> Best wishes
>> Volker
>> 
>> Volker Blum
>> Associate Professor
>> Ab Initio Materials Simulations
>> Thomas Lord Department of Mechanical Engineering and Materials Science
>> Duke University
>> https://aims.pratt.duke.edu
>> 
>> volker.blum at duke.edu
>> Twitter: Aimsduke
>> 
>> Office: 4308 Chesterfield Building
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

Volker Blum
Associate Professor
Ab Initio Materials Simulations
Thomas Lord Department of Mechanical Engineering and Materials Science
Duke University
https://aims.pratt.duke.edu

volker.blum at duke.edu
Twitter: Aimsduke

Office: 4308 Chesterfield Building











More information about the slurm-users mailing list