[slurm-users] Running multiple jobs simultaneously

Matt Hohmeister hohmeister at psy.fsu.edu
Thu Sep 26 16:13:35 UTC 2019


I have a two-node cluster running Slurm, and I'm being asked about allowing multiple jobs (hundreds of jobs) to run simultaneously. Following is my scheduling part of slurm.conf, which I changed to allow multiple jobs to run on each node:

# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core

For testing purposes, I'm running this job:

#!/bin/bash
#SBATCH --job-name=whatever
#SBATCH --output=slurmBatchLists_Aug19.out
#SBATCH --error=slurmBatchLists_Aug19.err
#SBATCH --partition=debug
#SBATCH --nodes=1
#SBATCH --array=70-100
#SBATCH --cpus-per-task=5
matlab -nodisplay -nojvm -r 'sampleSlurm($SLURM_ARRAY_TASK_ID);'

...which gives me the following squeue output:

[mhohmeis at odin ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
     1742_[82-100]     debug whatever mhohmeis PD       0:00      1 (Resources)
     1755_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1756_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1757_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1758_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1759_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1760_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1761_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1762_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1763_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
           1742_70     debug whatever mhohmeis  R       0:03      1 odin
           1742_71     debug whatever mhohmeis  R       0:03      1 odin
           1742_72     debug whatever mhohmeis  R       0:03      1 odin
           1742_73     debug whatever mhohmeis  R       0:03      1 odin
           1742_74     debug whatever mhohmeis  R       0:03      1 odin
           1742_75     debug whatever mhohmeis  R       0:03      1 odin
           1742_76     debug whatever mhohmeis  R       0:03      1 thor
           1742_77     debug whatever mhohmeis  R       0:03      1 thor
           1742_78     debug whatever mhohmeis  R       0:03      1 thor
           1742_79     debug whatever mhohmeis  R       0:03      1 thor
           1742_80     debug whatever mhohmeis  R       0:03      1 thor
           1742_81     debug whatever mhohmeis  R       0:03      1 thor

They're interested in allowing *all* these jobs to run simultaneously. Also, when they add #SBATCH --ntasks=30 to the above .sbatch file, this happens when they try to run it:

[mhohmeis at odin ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
     2052_[70-100]     debug whatever mhohmeis PD       0:00      4 (PartitionConfig)

Any thoughts?

Thanks!

Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739
Pronouns: he/him/his

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190926/ad59609d/attachment-0001.htm>


More information about the slurm-users mailing list