[slurm-users] Slurm 17.11 and configuring backfill and oversubscribe to allow concurrent processes

Thu Feb 27 08:53:01 UTC 2020

Hi;

At your partition definition, there is "Shared=NO". This is means "do 
not share nodes between jobs". This parameter conflict with 
"OverSubscribe=FORCE:12 " parameter. Acording to the slurm 
documentation, the Shared parameter has been replaced by the 
OverSubscribe parameter. But, I suppose it still works.

Regards,

Ahmet M.

On 26.02.2020 22:56, Robert Kudyba wrote:
> We run Bright 8.1 and Slurm 17.11. We are trying to allow for multiple 
> concurrent jobs to run on our small 4 node cluster.
>
> Based on 
> https://community.brightcomputing.com/question/5d6614ba08e8e81e885f1991?action=artikel&cat=14&id=410&artlang=en&highlight=slurm+%2526%252334%253Bgang+scheduling%2526%252334%253B 
> and
> https://slurm.schedmd.com/cons_res_share.html
>
> Here are some settings in /etc/slurm/slurm.conf:
>
> SchedulerType=sched/backfill
> # Nodes
> NodeName=node[001-003] CoresPerSocket=12 RealMemory=191800 Sockets=2 
> Gres=gpu:1
> # Partitions
> PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL 
> PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO 
> Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO 
> AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO 
> OverSubscribe=FORCE:12 OverTimeLimit=0 State=UP Nodes=node[001-003]
> PartitionName=gpuq Default=NO MinNodes=1 AllowGroups=ALL 
> PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO 
> Hidden=NO Shared=NO GraceTime= 0 PreemptMode=OFF ReqResv=NO 
> AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO 
> OverSubscribe=FORCE:12 OverTimeLimit=0 State=UP
> # Generic resources types
> GresTypes=gpu,mic
> # Epilog/Prolog parameters
> PrologSlurmctld=/cm/local/apps/cmd/scripts/prolog-prejob
> Prolog=/cm/local/apps/cmd/scripts/prolog
> Epilog=/cm/local/apps/cmd/scripts/epilog
> # Fast Schedule option
> FastSchedule=1
> # Power Saving
> SuspendTime=-1 # this disables power saving
> SuspendTimeout=30
> ResumeTimeout=60
> SuspendProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweroff
> ResumeProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweron
> # END AUTOGENERATED SECTION -- DO NOT REMOVE
> # 
> http://kb.brightcomputing.com/faq/index.php?action=artikel&cat=14&id=410&artlang=en&highlight=slurm+%26%2334%3Bgang+scheduling%26%2334%3B
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU
> SchedulerTimeSlice=60
> EnforcePartLimits=YES
>
> But it appears each job takes 1 of the 3 nodes and all other jobs are 
> back scheduled. Do we have an incorrect option set?
>
> squeue -a
> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
> 1937 defq PaNet5 user1 PD 0:00 1 (Resources)
> 1938 defq PoNet5 user1 PD 0:00 1 (Priority)
> 1964 defq SENet5 user1 PD 0:00 1 (Priority)
> 1979 defq IcNet5 user1 PD 0:00 1 (Priority)
> 1980 defq runtrain user2 PD 0:00 1 (Priority)
> 1981 defq InRes5 user1   PD 0:00 1 (Priority)
> 1983 defq run_LSTM user3 PD 0:00 1 (Priority)
> 1984 defq run_hui. user4 PD 0:00 1 (Priority)
> 1936 defq SeRes5 user1   R 10:02:39 1 node003
> 1950 defq sequenti user5  R 1-02:03:00 1 node001
> 1978 defq run_hui. user16 R 13:48:21 1 node002
>
> Am I misunderstanding some of the settings?
>
>