[slurm-users] Multinode MPI job

Mahmood Naderan mahmood.nt at gmail.com
Thu Mar 28 17:09:26 UTC 2019


I test with

env
strace srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i
mos2.rlx.in

in the slurm script and everything is fine now!!
This is going to be a nasty bug to find...


Regards,
Mahmood




On Thu, Mar 28, 2019 at 9:18 PM Mahmood Naderan <mahmood.nt at gmail.com>
wrote:

> Yes that works.
>
> $ grep "Parallel version" big-mem
>      Parallel version (MPI), running on     1 processors
>      Parallel version (MPI), running on     1 processors
>      Parallel version (MPI), running on     1 processors
>      Parallel version (MPI), running on     1 processors
> $ squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>                776    QUARTZ     myQE   ghatee  R       1:08      2
> compute-0-2,rocks7
> $ grep "pseudo file is empty or wrong" big-mem
> $ squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>                776    QUARTZ     myQE   ghatee  R       1:47      2
> compute-0-2,rocks7
> $ cat slurm_script.sh
> #!/bin/bash
> #SBATCH --job-name=myQE
> #SBATCH --output=big-mem
> #SBATCH --ntasks-per-node=2
> #SBATCH --nodes=2
> #SBATCH --mem-per-cpu=16G
> #SBATCH --partition=QUARTZ
> #SBATCH --account=z5
> srun pw.x -i mos2.rlx.in
>
>
> I will try to dig more.
>
> Regards,
> Mahmood
>
>
>
>
> On Thu, Mar 28, 2019 at 9:04 PM Frava <fravadona at gmail.com> wrote:
>
>> Well, does it also crash when you run it with two nodes in a normal way
>> (not using heterogeneous jobs) ?
>>
>> #!/bin/bash
>> #SBATCH --job-name=myQE_2Nx2MPI
>> #SBATCH --output=big-mem
>> #SBATCH --nodes=2
>> #SBATCH --ntasks-per-node=2
>> #SBATCH --mem-per-cpu=16g
>> #SBATCH --partition=QUARTZ
>> #SBATCH --account=z5
>> #
>> srun pw.x -i mos2.rlx.in
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190328/cd90cfa5/attachment.html>


More information about the slurm-users mailing list