[slurm-users] Multinode MPI job
Mahmood Naderan
mahmood.nt at gmail.com
Thu Mar 28 17:09:26 UTC 2019
I test with
env
strace srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i
mos2.rlx.in
in the slurm script and everything is fine now!!
This is going to be a nasty bug to find...
Regards,
Mahmood
On Thu, Mar 28, 2019 at 9:18 PM Mahmood Naderan <mahmood.nt at gmail.com>
wrote:
> Yes that works.
>
> $ grep "Parallel version" big-mem
> Parallel version (MPI), running on 1 processors
> Parallel version (MPI), running on 1 processors
> Parallel version (MPI), running on 1 processors
> Parallel version (MPI), running on 1 processors
> $ squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 776 QUARTZ myQE ghatee R 1:08 2
> compute-0-2,rocks7
> $ grep "pseudo file is empty or wrong" big-mem
> $ squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 776 QUARTZ myQE ghatee R 1:47 2
> compute-0-2,rocks7
> $ cat slurm_script.sh
> #!/bin/bash
> #SBATCH --job-name=myQE
> #SBATCH --output=big-mem
> #SBATCH --ntasks-per-node=2
> #SBATCH --nodes=2
> #SBATCH --mem-per-cpu=16G
> #SBATCH --partition=QUARTZ
> #SBATCH --account=z5
> srun pw.x -i mos2.rlx.in
>
>
> I will try to dig more.
>
> Regards,
> Mahmood
>
>
>
>
> On Thu, Mar 28, 2019 at 9:04 PM Frava <fravadona at gmail.com> wrote:
>
>> Well, does it also crash when you run it with two nodes in a normal way
>> (not using heterogeneous jobs) ?
>>
>> #!/bin/bash
>> #SBATCH --job-name=myQE_2Nx2MPI
>> #SBATCH --output=big-mem
>> #SBATCH --nodes=2
>> #SBATCH --ntasks-per-node=2
>> #SBATCH --mem-per-cpu=16g
>> #SBATCH --partition=QUARTZ
>> #SBATCH --account=z5
>> #
>> srun pw.x -i mos2.rlx.in
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190328/cd90cfa5/attachment.html>
More information about the slurm-users
mailing list