[slurm-users] Priority wait

Andy Riebs andy.riebs at hpe.com
Tue Nov 14 08:08:12 MST 2017


Hi Roy,

What command are you using to start the jobs?

On 11/14/2017 09:58 AM, Zohar Roe MLM wrote:
>
> Hello,
>
> Trying again with the slurm.conf This time.
>
> I have a cluster name: Autobot
>
> In this cluster I have servers:
>
> Optimus[1-10] and
>
> Megatron[1-10].
>
> I sent 3000 jobs with feature Optimus and part are running while part 
> are pendind. Which is ok.
>
> But I have sent 1000 jobs to Megatron and they are all in pending 
> stating they wait because of priority. Whay os that?
>
> B.t.w if I change their priority to a higher one, they start to run on 
> Megatron.
>
> SLURM.CONF
>
> ControlMachine=slurmserver
>
> ControlAddr=131.1.1.1
>
> AuthType=auth/munge
>
> CacheGroups=0
>
> CryptoType=crypto/munge
>
> MpiDefault=none
>
> MpiParams=ports=12000-12999
>
> ProctrackType=proctrack/linuxproc
>
> ReturnToService=2
>
> SlurmctldPidFile=/var/run/slurmctld.pid
>
> SlurmctldPort=6817
>
> SlurmdPidFile=/var/run/slurmd.pid
>
> SlurmdPort=6818
>
> SlurmdSpoolDir=/var/spool/slurmd
>
> SlurmUser=slurm
>
> StateSaveLocation=/var/spool/slurmctld
>
> SwitchType=switch/none
>
> MaxJobCount=120000
>
> PriorityType= priority/basic
>
> TaskPlugin=task/none
>
> InactiveLimit=0
>
> KillWait=30
>
> CompleteWait=10
>
> MinJobAge=300
>
> SlurmctldTimeout=120
>
> SlurmdTimeout=300
>
> Waittime=0
>
> FastSchedule=1
>
> SchedulerType=sched/backfill
>
> SchedulerPort=7321
>
> SelectType=select/cons_res
>
> SelectTypeParameters=CR_LLN,CR_CPU_Memory
>
> AccountingStorageType=accounting_storage/filetxt
>
> AccountingStorageLoc=/etc/slurm/slurmAccount.txt
>
> AccountingStoreJobComment=YES
>
> ClusterName=MyCluster
>
> JobCompLoc=/var/log/slurm/jobcom.log
>
> JobCompType=jobcomp/filetxt
>
> JobAcctGatherFrequency=30
>
> JobAcctGatherType=jobacct_gather/none
>
> SlurmctldDebug=4
>
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>
> SlurmdDebug=4
>
> SlurmdLogFile=/var/log/slurm/slurmd.log
>
> PreemptMode=requeue
>
> PreemptType=preempt/partition_prio
>
> DefMemPerCPU=10
>
> DebugFlags=NO_CONF_HASH
>
> ###############################################
>
> #   C O M P U T E    N O D E S                #
>
> ###############################################
>
> ########################
>
> #   SLURM Server       #
>
> ########################
>
> NodeName=slurmserver NodeAddr=131.1.1.1   CPUs=4 State=UNKNOWN
>
> ########################
>
> #   Autobot-Cluster     #
>
> ########################
>
> NodeName=Optimus1 NodeAddr=131.1.20.31    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus2 NodeAddr=131.1.20.32    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus3 NodeAddr=131.1.20.33    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus4 NodeAddr=131.1.20.34    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus5 NodeAddr=131.1.20.35    CPUs=24 RealMemory=96728 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus6 NodeAddr=131.1.20.36    CPUs=16 RealMemory=129022 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus7 NodeAddr=131.1.20.37    CPUs=16 RealMemory=129022 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus8 NodeAddr=131.1.20.38    CPUs=12 RealMemory=64410 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus9 NodeAddr=131.1.20.39    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus10 NodeAddr=131.1.20.40    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Megatron1 NodeAddr=131.1.20.41    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron2 NodeAddr=131.1.20.42    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron3 NodeAddr=131.1.20.43    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron4 NodeAddr=131.1.20.44    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron5 NodeAddr=131.1.20.45    CPUs=24 RealMemory=96728 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron6 NodeAddr=131.1.20.46    CPUs=16 RealMemory=129022 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron7 NodeAddr=131.1.20.47    CPUs=16 RealMemory=129022 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron8 NodeAddr=131.1.20.48    CPUs=12 RealMemory=64410 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron9 NodeAddr=131.1.20.49    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron10 NodeAddr=131.1.20.50    CPUs=12 RealMemory=96728 
> State=UNKNOWN Feature=autobot,megatron
>
> ###############################################
>
> #       P A R T I T I O N S                   #
>
> ###############################################
>
> PartitionName=Autobot-Cluster Nodes=Optimus[1-10],Megatron[1-10]  
> Default=YES MaxTime=28800 State=UP  LLN=YES Priority=10
>
> Thanks in advanced,
>
> Roy
>
>
> *********************************************************************************************** 
> Please consider the environment before printing this email ! The 
> information contained in this communication is proprietary to Israel 
> Aerospace Industries Ltd. and/or third parties, may contain 
> confidential or privileged information, and is intended only for the 
> use of the intended addressee thereof. If you are not the intended 
> addressee, please be aware that any use, disclosure, distribution 
> and/or copying of this communication is strictly prohibited. If you 
> receive this communication in error, please notify the sender 
> immediately and delete it from your computer. Thank you. Visit us at: 
> www.iai.co.il
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171114/705ee37c/attachment-0001.html>


More information about the slurm-users mailing list