[slurm-users] Multinode MPI job
Frava
fravadona at gmail.com
Wed Mar 27 19:34:04 UTC 2019
Hi,
if you try this SBATCH script, does it work ?
#!/bin/bash
#SBATCH --job-name=myQE
#SBATCH --output=big-mem
#
#SBATCH --mem-per-cpu=16g --ntasks=2
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
#
#SBATCH packjob
#
#SBATCH --mem-per-cpu=10g --ntasks=4
#SBATCH -N 1
#SBATCH --partition=QUARTZ
#SBATCH --account=z5
#
srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i
mos2.rlx.in
Regards,
Rafael.
Le mer. 27 mars 2019 à 20:13, Mahmood Naderan <mahmood.nt at gmail.com> a
écrit :
> OK. The two different partitions I saw was due to not specifying partition
> name for the first set (before packjob). Here is a better script
>
> #!/bin/bash
> #SBATCH --job-name=myQE
> #SBATCH --output=big-mem
> #SBATCH --mem-per-cpu=16g --ntasks=2
> #SBATCH -N 1
> #SBATCH --partition=QUARTZ
> #SBATCH --account=z5
> #SBATCH packjob
> #SBATCH --mem-per-cpu=10g --ntasks=4
> #SBATCH -N 1
> #SBATCH --partition=QUARTZ
> #SBATCH --account=z5
> srun pw.x -i mos2.rlx.in
>
>
> One node should run 2 processes (32GB total) and one other node should run
> 4 process (40GB total).
> The queue looks like
>
> $ squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 747+1 QUARTZ myQE ghatee R 0:02 1 rocks7
> 747+0 QUARTZ myQE ghatee R 0:02 1
> compute-0-2
>
>
> As I checked the node, one 2 processes are running on compute-0-2 (first
> set before packjob). But there is no processes on rocks7.
>
> $ rocks run host compute-0-2 "ps aux | grep pw.x"
> ghatee 30234 0.0 0.0 251208 4996 ? Sl 15:04 0:00 srun pw.x
> -i mos2.rlx.in
> ghatee 30235 0.0 0.0 46452 748 ? S 15:04 0:00 srun pw.x
> -i mos2.rlx.in
> ghatee 30247 99.8 0.1 1930484 129696 ? Rl 15:04 4:31
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 30248 99.8 0.1 1930488 129704 ? Rl 15:04 4:31
> /home/ghatee/QuantumEspresso621/bin/pw.x -i mos2.rlx.in
> ghatee 30352 0.0 0.0 113132 1592 ? Ss 15:09 0:00 bash -c
> ps aux | grep pw.x
> ghatee 30381 0.0 0.0 112664 960 ? S 15:09 0:00 grep pw.x
>
>
> $ rocks run host rocks7 "ps aux | grep pw.x"
> ghatee 17141 0.0 0.0 316476 26632 pts/21 Sl+ 23:39 0:00
> /opt/rocks/bin/python /opt/rocks/bin/rocks run host rocks7 ps aux | grep
> pw.x
> ghatee 17143 0.0 0.0 113132 1364 pts/21 S+ 23:39 0:00 bash -c
> ps aux | grep pw.x
> ghatee 17145 0.0 0.0 112664 960 pts/21 R+ 23:39 0:00 grep pw.x
>
>
>
>
> Any idea?
> It seems that the mpirun I have is not compatible with the hetro
> configuration because the SBATCH parameters are straight forward.
>
>
> Regards,
> Mahmood
>
>
>
>
> On Wed, Mar 27, 2019 at 11:03 PM Christopher Samuel <chris at csamuel.org>
> wrote:
>
>> On 3/27/19 11:29 AM, Mahmood Naderan wrote:
>>
>> > Thank you very much. you are right. I got it.
>>
>> Cool, good to hear.
>>
>> I'd love to hear whether you get heterogenous MPI jobs working too!
>>
>> All the best,
>> Chris
>> --
>> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190327/fa5ec3b8/attachment.html>
More information about the slurm-users
mailing list