[slurm-users] 转发: a heterogeneous job terminate unexpectedly

huml1 at sugon.com huml1 at sugon.com
Thu Feb 28 01:06:37 UTC 2019


Dear there,
        I have a cluster with 9 nodes(cmbc[1530-1538]) , each node has 2 cpus and each cpu has 32cores, 
but when I submitted a heterogeneous job twice ,the second job terminated unexpectedly. 
This problem has been bothering me all day. Slurm version is 18.08.5 and here is the job :
**************
#!/bin/bash
#SBATCH -J FIRE
#SBATCH -o log.heter.%j
#SBATCH -e log.heter.%j
#SBATCH --comment=WRF
#SBATCH --mem=20G
#SBATCH -p largemem
#SBATCH -n 64 -N 2
#SBATCH packjob
#SBATCH -J HAHA1
#SBATCH -p largemem
#SBATCH -n 16 -N 1
#SBATCH --mem=20G
#SBATCH packjob
#SBATCH -J HAHA2
#SBATCH -w cmbc1533
#SBATCH -p largemem
#SBATCH -n 8 -N 1
#SBATCH --mem=20G

module load compiler/intel/composer_xe_2018.1.163
module load mpi/intelmpi/2018.1
export I_MPI_PMI_LIBRARY=/opt/slurm18/lib/libpmi.so
time srun --mpi=pmi2 ./inter_fire 960000000 : ./intel_fire 960000000 : ./intel_fire 960000000
date
*********************
Here is the error of the terminated job :
Appreciatively,
Menglong


祝工作顺利!
姓名   胡梦龙
手机   135-6164-9610
部门   HPC


中科曙光国际信息产业有限公司 
青岛市崂山区株洲路78号中科曙光大厦(3号楼) 266000 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190228/4204dd3b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Catch.jpg
Type: image/jpeg
Size: 34641 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190228/4204dd3b/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6942_M2(02-28-09-05-40).png
Type: image/png
Size: 6942 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190228/4204dd3b/attachment-0001.png>


More information about the slurm-users mailing list