[slurm-users] Priority wait
Zohar Roe MLM
RZohar8 at iai.co.il
Tue Nov 14 07:58:00 MST 2017
Hello,
Trying again with the slurm.conf This time.
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and part are running while part are pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are all in pending stating they wait because of priority. Whay os that?
B.t.w if I change their priority to a higher one, they start to run on Megatron.
SLURM.CONF
ControlMachine=slurmserver
ControlAddr=131.1.1.1
AuthType=auth/munge
CacheGroups=0
CryptoType=crypto/munge
MpiDefault=none
MpiParams=ports=12000-12999
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
MaxJobCount=120000
PriorityType= priority/basic
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
CompleteWait=10
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_LLN,CR_CPU_Memory
AccountingStorageType=accounting_storage/filetxt
AccountingStorageLoc=/etc/slurm/slurmAccount.txt
AccountingStoreJobComment=YES
ClusterName=MyCluster
JobCompLoc=/var/log/slurm/jobcom.log
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=4
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=4
SlurmdLogFile=/var/log/slurm/slurmd.log
PreemptMode=requeue
PreemptType=preempt/partition_prio
DefMemPerCPU=10
DebugFlags=NO_CONF_HASH
###############################################
# C O M P U T E N O D E S #
###############################################
########################
# SLURM Server #
########################
NodeName=slurmserver NodeAddr=131.1.1.1 CPUs=4 State=UNKNOWN
########################
# Autobot-Cluster #
########################
NodeName=Optimus1 NodeAddr=131.1.20.31 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus2 NodeAddr=131.1.20.32 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus3 NodeAddr=131.1.20.33 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus4 NodeAddr=131.1.20.34 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus5 NodeAddr=131.1.20.35 CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus6 NodeAddr=131.1.20.36 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus7 NodeAddr=131.1.20.37 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus8 NodeAddr=131.1.20.38 CPUs=12 RealMemory=64410 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus9 NodeAddr=131.1.20.39 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus10 NodeAddr=131.1.20.40 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus
NodeName=Megatron1 NodeAddr=131.1.20.41 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron2 NodeAddr=131.1.20.42 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron3 NodeAddr=131.1.20.43 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron4 NodeAddr=131.1.20.44 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron5 NodeAddr=131.1.20.45 CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron6 NodeAddr=131.1.20.46 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron7 NodeAddr=131.1.20.47 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron8 NodeAddr=131.1.20.48 CPUs=12 RealMemory=64410 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron9 NodeAddr=131.1.20.49 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron10 NodeAddr=131.1.20.50 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron
###############################################
# P A R T I T I O N S #
###############################################
PartitionName=Autobot-Cluster Nodes=Optimus[1-10],Megatron[1-10] Default=YES MaxTime=28800 State=UP LLN=YES Priority=10
Thanks in advanced,
Roy
***********************************************************************************************
Please consider the environment before printing this email !
The information contained in this communication is proprietary to Israel Aerospace Industries Ltd. and/or third parties, may contain confidential or privileged information, and is intended only for the use of the intended addressee thereof.
If you are not the intended addressee, please be aware that any use, disclosure, distribution and/or copying of this communication is strictly prohibited. If you receive this communication in error, please notify the sender immediately and delete it from your computer.
Thank you.
Visit us at: www.iai.co.il
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171114/bffc47f0/attachment-0001.html>
More information about the slurm-users
mailing list