[slurm-users] Priority wait
Andy Riebs
andy.riebs at hpe.com
Tue Nov 14 08:08:12 MST 2017
Hi Roy,
What command are you using to start the jobs?
On 11/14/2017 09:58 AM, Zohar Roe MLM wrote:
>
> Hello,
>
> Trying again with the slurm.conf This time.
>
> I have a cluster name: Autobot
>
> In this cluster I have servers:
>
> Optimus[1-10] and
>
> Megatron[1-10].
>
> I sent 3000 jobs with feature Optimus and part are running while part
> are pendind. Which is ok.
>
> But I have sent 1000 jobs to Megatron and they are all in pending
> stating they wait because of priority. Whay os that?
>
> B.t.w if I change their priority to a higher one, they start to run on
> Megatron.
>
> SLURM.CONF
>
> ControlMachine=slurmserver
>
> ControlAddr=131.1.1.1
>
> AuthType=auth/munge
>
> CacheGroups=0
>
> CryptoType=crypto/munge
>
> MpiDefault=none
>
> MpiParams=ports=12000-12999
>
> ProctrackType=proctrack/linuxproc
>
> ReturnToService=2
>
> SlurmctldPidFile=/var/run/slurmctld.pid
>
> SlurmctldPort=6817
>
> SlurmdPidFile=/var/run/slurmd.pid
>
> SlurmdPort=6818
>
> SlurmdSpoolDir=/var/spool/slurmd
>
> SlurmUser=slurm
>
> StateSaveLocation=/var/spool/slurmctld
>
> SwitchType=switch/none
>
> MaxJobCount=120000
>
> PriorityType= priority/basic
>
> TaskPlugin=task/none
>
> InactiveLimit=0
>
> KillWait=30
>
> CompleteWait=10
>
> MinJobAge=300
>
> SlurmctldTimeout=120
>
> SlurmdTimeout=300
>
> Waittime=0
>
> FastSchedule=1
>
> SchedulerType=sched/backfill
>
> SchedulerPort=7321
>
> SelectType=select/cons_res
>
> SelectTypeParameters=CR_LLN,CR_CPU_Memory
>
> AccountingStorageType=accounting_storage/filetxt
>
> AccountingStorageLoc=/etc/slurm/slurmAccount.txt
>
> AccountingStoreJobComment=YES
>
> ClusterName=MyCluster
>
> JobCompLoc=/var/log/slurm/jobcom.log
>
> JobCompType=jobcomp/filetxt
>
> JobAcctGatherFrequency=30
>
> JobAcctGatherType=jobacct_gather/none
>
> SlurmctldDebug=4
>
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>
> SlurmdDebug=4
>
> SlurmdLogFile=/var/log/slurm/slurmd.log
>
> PreemptMode=requeue
>
> PreemptType=preempt/partition_prio
>
> DefMemPerCPU=10
>
> DebugFlags=NO_CONF_HASH
>
> ###############################################
>
> # C O M P U T E N O D E S #
>
> ###############################################
>
> ########################
>
> # SLURM Server #
>
> ########################
>
> NodeName=slurmserver NodeAddr=131.1.1.1 CPUs=4 State=UNKNOWN
>
> ########################
>
> # Autobot-Cluster #
>
> ########################
>
> NodeName=Optimus1 NodeAddr=131.1.20.31 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus2 NodeAddr=131.1.20.32 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus3 NodeAddr=131.1.20.33 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus4 NodeAddr=131.1.20.34 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus5 NodeAddr=131.1.20.35 CPUs=24 RealMemory=96728
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus6 NodeAddr=131.1.20.36 CPUs=16 RealMemory=129022
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus7 NodeAddr=131.1.20.37 CPUs=16 RealMemory=129022
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus8 NodeAddr=131.1.20.38 CPUs=12 RealMemory=64410
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus9 NodeAddr=131.1.20.39 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Optimus10 NodeAddr=131.1.20.40 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,optimus
>
> NodeName=Megatron1 NodeAddr=131.1.20.41 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron2 NodeAddr=131.1.20.42 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron3 NodeAddr=131.1.20.43 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron4 NodeAddr=131.1.20.44 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron5 NodeAddr=131.1.20.45 CPUs=24 RealMemory=96728
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron6 NodeAddr=131.1.20.46 CPUs=16 RealMemory=129022
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron7 NodeAddr=131.1.20.47 CPUs=16 RealMemory=129022
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron8 NodeAddr=131.1.20.48 CPUs=12 RealMemory=64410
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron9 NodeAddr=131.1.20.49 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,megatron
>
> NodeName=Megatron10 NodeAddr=131.1.20.50 CPUs=12 RealMemory=96728
> State=UNKNOWN Feature=autobot,megatron
>
> ###############################################
>
> # P A R T I T I O N S #
>
> ###############################################
>
> PartitionName=Autobot-Cluster Nodes=Optimus[1-10],Megatron[1-10]
> Default=YES MaxTime=28800 State=UP LLN=YES Priority=10
>
> Thanks in advanced,
>
> Roy
>
>
> ***********************************************************************************************
> Please consider the environment before printing this email ! The
> information contained in this communication is proprietary to Israel
> Aerospace Industries Ltd. and/or third parties, may contain
> confidential or privileged information, and is intended only for the
> use of the intended addressee thereof. If you are not the intended
> addressee, please be aware that any use, disclosure, distribution
> and/or copying of this communication is strictly prohibited. If you
> receive this communication in error, please notify the sender
> immediately and delete it from your computer. Thank you. Visit us at:
> www.iai.co.il
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171114/705ee37c/attachment-0001.html>
More information about the slurm-users
mailing list