[slurm-users] Slurm 17.11 and configuring backfill and oversubscribe to allow concurrent processes

Robert Kudyba rkudyba at fordham.edu
Wed Feb 26 19:56:35 UTC 2020


We run Bright 8.1 and Slurm 17.11. We are trying to allow for multiple
concurrent jobs to run on our small 4 node cluster.

Based on
https://community.brightcomputing.com/question/5d6614ba08e8e81e885f1991?action=artikel&cat=14&id=410&artlang=en&highlight=slurm+%2526%252334%253Bgang+scheduling%2526%252334%253B
and
https://slurm.schedmd.com/cons_res_share.html

Here are some settings in /etc/slurm/slurm.conf:

SchedulerType=sched/backfill
# Nodes
NodeName=node[001-003] CoresPerSocket=12 RealMemory=191800 Sockets=2
Gres=gpu:1
# Partitions
PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL
PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO
Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL
AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=FORCE:12 OverTimeLimit=0
State=UP Nodes=node[001-003]
PartitionName=gpuq Default=NO MinNodes=1 AllowGroups=ALL
PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO
Shared=NO GraceTime= 0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL
AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=FORCE:12 OverTimeLimit=0
State=UP
# Generic resources types
GresTypes=gpu,mic
# Epilog/Prolog parameters
PrologSlurmctld=/cm/local/apps/cmd/scripts/prolog-prejob
Prolog=/cm/local/apps/cmd/scripts/prolog
Epilog=/cm/local/apps/cmd/scripts/epilog
# Fast Schedule option
FastSchedule=1
# Power Saving
SuspendTime=-1 # this disables power saving
SuspendTimeout=30
ResumeTimeout=60
SuspendProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweroff
ResumeProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweron
# END AUTOGENERATED SECTION -- DO NOT REMOVE
#
http://kb.brightcomputing.com/faq/index.php?action=artikel&cat=14&id=410&artlang=en&highlight=slurm+%26%2334%3Bgang+scheduling%26%2334%3B
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
SchedulerTimeSlice=60
EnforcePartLimits=YES

But it appears each job takes 1 of the 3 nodes and all other jobs are back
scheduled. Do we have an incorrect option set?

squeue -a
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1937 defq PaNet5 user1 PD 0:00 1 (Resources)
1938 defq PoNet5 user1 PD 0:00 1 (Priority)
1964 defq SENet5 user1 PD 0:00 1 (Priority)
1979 defq IcNet5 user1 PD 0:00 1 (Priority)
1980 defq runtrain user2 PD 0:00 1 (Priority)
1981 defq InRes5  user1   PD 0:00 1 (Priority)
1983 defq run_LSTM user3 PD 0:00 1 (Priority)
1984 defq run_hui. user4 PD 0:00 1 (Priority)
1936 defq SeRes5  user1   R 10:02:39 1 node003
1950 defq sequenti  user5  R 1-02:03:00 1 node001
1978 defq run_hui. user16 R 13:48:21 1 node002

Am I misunderstanding some of the settings?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200226/8e913a74/attachment-0001.htm>


More information about the slurm-users mailing list