[slurm-users] Multinode MPI job

Mahmood Naderan mahmood.nt at gmail.com
Wed Mar 27 19:10:28 UTC 2019


OK. The two different partitions I saw was due to not specifying partition
name for the first set (before packjob). Here is a better script

#!/bin/bash
#SBATCH --job-name=myQE
#SBATCH --output=big-mem
#SBATCH --mem-per-cpu=16g --ntasks=2
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
#SBATCH packjob
#SBATCH --mem-per-cpu=10g --ntasks=4
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
srun pw.x -i mos2.rlx.in


One node should run 2 processes (32GB total) and one other node should run
4 process (40GB total).
The queue looks like

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
             747+1    QUARTZ     myQE   ghatee  R       0:02      1 rocks7
             747+0    QUARTZ     myQE   ghatee  R       0:02      1
compute-0-2


As I checked the node, one 2 processes are running on compute-0-2 (first
set before packjob). But there is no processes  on rocks7.

$ rocks run host compute-0-2 "ps aux | grep pw.x"
ghatee   30234  0.0  0.0 251208  4996 ?        Sl   15:04   0:00 srun pw.x
-i mos2.rlx.in
ghatee   30235  0.0  0.0  46452   748 ?        S    15:04   0:00 srun pw.x
-i mos2.rlx.in
ghatee   30247 99.8  0.1 1930484 129696 ?      Rl   15:04   4:31
/home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
ghatee   30248 99.8  0.1 1930488 129704 ?      Rl   15:04   4:31
/home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
ghatee   30352  0.0  0.0 113132  1592 ?        Ss   15:09   0:00 bash -c ps
aux | grep pw.x
ghatee   30381  0.0  0.0 112664   960 ?        S    15:09   0:00 grep pw.x


$ rocks run host rocks7 "ps aux | grep pw.x"
ghatee   17141  0.0  0.0 316476 26632 pts/21   Sl+  23:39   0:00
/opt/rocks/bin/python /opt/rocks/bin/rocks run host rocks7 ps aux | grep
pw.x
ghatee   17143  0.0  0.0 113132  1364 pts/21   S+   23:39   0:00 bash -c ps
aux | grep pw.x
ghatee   17145  0.0  0.0 112664   960 pts/21   R+   23:39   0:00 grep pw.x




Any idea?
It seems that the mpirun I have is not compatible with the hetro
configuration because the SBATCH parameters are straight forward.


Regards,
Mahmood




On Wed, Mar 27, 2019 at 11:03 PM Christopher Samuel <chris at csamuel.org>
wrote:

> On 3/27/19 11:29 AM, Mahmood Naderan wrote:
>
> > Thank you very much. you are right. I got it.
>
> Cool, good to hear.
>
> I'd love to hear whether you get heterogenous MPI jobs working too!
>
> All the best,
> Chris
> --
>    Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190327/52895db4/attachment-0001.html>


More information about the slurm-users mailing list