[slurm-users] Multinode MPI job

Frava fravadona at gmail.com
Wed Mar 27 19:34:04 UTC 2019


Hi,
if you try this SBATCH script, does it work ?

#!/bin/bash
#SBATCH --job-name=myQE
#SBATCH --output=big-mem
#
#SBATCH --mem-per-cpu=16g --ntasks=2
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
#
#SBATCH packjob
#
#SBATCH --mem-per-cpu=10g --ntasks=4
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
#
srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i
mos2.rlx.in

Regards,
Rafael.

Le mer. 27 mars 2019 à 20:13, Mahmood Naderan <mahmood.nt at gmail.com> a
écrit :

> OK. The two different partitions I saw was due to not specifying partition
> name for the first set (before packjob). Here is a better script
>
> #!/bin/bash
> #SBATCH --job-name=myQE
> #SBATCH --output=big-mem
> #SBATCH --mem-per-cpu=16g --ntasks=2
> #SBATCH -N 1
> #SBATCH --partition=QUARTZ
> #SBATCH --account=z5
> #SBATCH packjob
> #SBATCH --mem-per-cpu=10g --ntasks=4
> #SBATCH -N 1
> #SBATCH --partition=QUARTZ
> #SBATCH --account=z5
> srun pw.x -i mos2.rlx.in
>
>
> One node should run 2 processes (32GB total) and one other node should run
> 4 process (40GB total).
> The queue looks like
>
> $ squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>              747+1    QUARTZ     myQE   ghatee  R       0:02      1 rocks7
>              747+0    QUARTZ     myQE   ghatee  R       0:02      1
> compute-0-2
>
>
> As I checked the node, one 2 processes are running on compute-0-2 (first
> set before packjob). But there is no processes  on rocks7.
>
> $ rocks run host compute-0-2 "ps aux | grep pw.x"
> ghatee   30234  0.0  0.0 251208  4996 ?        Sl   15:04   0:00 srun pw.x
> -i mos2.rlx.in
> ghatee   30235  0.0  0.0  46452   748 ?        S    15:04   0:00 srun pw.x
> -i mos2.rlx.in
> ghatee   30247 99.8  0.1 1930484 129696 ?      Rl   15:04   4:31
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee   30248 99.8  0.1 1930488 129704 ?      Rl   15:04   4:31
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee   30352  0.0  0.0 113132  1592 ?        Ss   15:09   0:00 bash -c
> ps aux | grep pw.x
> ghatee   30381  0.0  0.0 112664   960 ?        S    15:09   0:00 grep pw.x
>
>
> $ rocks run host rocks7 "ps aux | grep pw.x"
> ghatee   17141  0.0  0.0 316476 26632 pts/21   Sl+  23:39   0:00
> /opt/rocks/bin/python /opt/rocks/bin/rocks run host rocks7 ps aux | grep
> pw.x
> ghatee   17143  0.0  0.0 113132  1364 pts/21   S+   23:39   0:00 bash -c
> ps aux | grep pw.x
> ghatee   17145  0.0  0.0 112664   960 pts/21   R+   23:39   0:00 grep pw.x
>
>
>
>
> Any idea?
> It seems that the mpirun I have is not compatible with the hetro
> configuration because the SBATCH parameters are straight forward.
>
>
> Regards,
> Mahmood
>
>
>
>
> On Wed, Mar 27, 2019 at 11:03 PM Christopher Samuel <chris at csamuel.org>
> wrote:
>
>> On 3/27/19 11:29 AM, Mahmood Naderan wrote:
>>
>> > Thank you very much. you are right. I got it.
>>
>> Cool, good to hear.
>>
>> I'd love to hear whether you get heterogenous MPI jobs working too!
>>
>> All the best,
>> Chris
>> --
>>    Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190327/fa5ec3b8/attachment.html>


More information about the slurm-users mailing list