[slurm-users] Priority wait

Zohar Roe MLM RZohar8 at iai.co.il
Tue Nov 14 07:58:00 MST 2017


Hello,
Trying again with the slurm.conf This time.

I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].

I sent 3000 jobs with feature Optimus and part are running while part are pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are all in pending stating they wait because of priority. Whay os that?

B.t.w if I change their priority to a higher one, they start to run on Megatron.

SLURM.CONF

ControlMachine=slurmserver
ControlAddr=131.1.1.1
AuthType=auth/munge
CacheGroups=0
CryptoType=crypto/munge
MpiDefault=none
MpiParams=ports=12000-12999
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
MaxJobCount=120000
PriorityType= priority/basic
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
CompleteWait=10
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_LLN,CR_CPU_Memory
AccountingStorageType=accounting_storage/filetxt
AccountingStorageLoc=/etc/slurm/slurmAccount.txt
AccountingStoreJobComment=YES
ClusterName=MyCluster
JobCompLoc=/var/log/slurm/jobcom.log
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=4
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=4
SlurmdLogFile=/var/log/slurm/slurmd.log
PreemptMode=requeue
PreemptType=preempt/partition_prio
DefMemPerCPU=10
DebugFlags=NO_CONF_HASH


###############################################
#   C O M P U T E    N O D E S                #
###############################################


########################
#   SLURM Server       #
########################
NodeName=slurmserver  NodeAddr=131.1.1.1   CPUs=4 State=UNKNOWN



########################
#   Autobot-Cluster     #
########################
NodeName=Optimus1   NodeAddr=131.1.20.31    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus2   NodeAddr=131.1.20.32    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus3   NodeAddr=131.1.20.33    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus4   NodeAddr=131.1.20.34    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus5   NodeAddr=131.1.20.35    CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus6   NodeAddr=131.1.20.36    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus7   NodeAddr=131.1.20.37    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus8   NodeAddr=131.1.20.38    CPUs=12 RealMemory=64410  State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus9   NodeAddr=131.1.20.39    CPUs=12 RealMemory=96728  State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus10  NodeAddr=131.1.20.40    CPUs=12 RealMemory=96728  State=UNKNOWN Feature=autobot,optimus

NodeName=Megatron1   NodeAddr=131.1.20.41    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron2   NodeAddr=131.1.20.42    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron3   NodeAddr=131.1.20.43    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron4   NodeAddr=131.1.20.44    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron5   NodeAddr=131.1.20.45    CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron6   NodeAddr=131.1.20.46    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron7   NodeAddr=131.1.20.47    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron8   NodeAddr=131.1.20.48    CPUs=12 RealMemory=64410  State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron9   NodeAddr=131.1.20.49    CPUs=12 RealMemory=96728  State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron10  NodeAddr=131.1.20.50    CPUs=12 RealMemory=96728  State=UNKNOWN Feature=autobot,megatron


###############################################
#       P A R T I T I O N S                   #
###############################################
PartitionName=Autobot-Cluster Nodes=Optimus[1-10],Megatron[1-10]  Default=YES MaxTime=28800 State=UP  LLN=YES Priority=10



Thanks in advanced,
Roy


***********************************************************************************************

Please consider the environment before printing this email !
The information contained in this communication is proprietary to Israel Aerospace Industries Ltd. and/or third parties, may contain confidential or privileged information, and is intended only for the use of the intended addressee thereof.
If you are not the intended addressee, please be aware that any use, disclosure, distribution and/or copying of this communication is strictly prohibited. If you receive this communication in error, please notify the sender immediately and delete it from your computer. 
Thank you.

Visit us at:   www.iai.co.il
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171114/bffc47f0/attachment-0001.html>


More information about the slurm-users mailing list