[slurm-users] Fairshare config change affect on running/queued jobs?
Paul Edmon
pedmon at cfa.harvard.edu
Fri Apr 30 16:42:33 UTC 2021
It shouldn't impact running jobs, all it should really do is impact
pending jobs as it will order them by their relative priority scores.
-Paul Edmon-
On 4/30/2021 12:39 PM, Walsh, Kevin wrote:
> Hello everyone,
>
> We wish to deploy "fair share" scheduling configuration and would like
> to inquire if we should be aware of effects this might have on jobs
> already running or already queued when the config is changed.
>
> The proposed changes are from the example at
> https://slurm.schedmd.com/archive/slurm-18.08.9/priority_multifactor.html#config
> <https://slurm.schedmd.com/archive/slurm-18.08.9/priority_multifactor.html#config>
> :
>
> # Activate the Multi-factor Job Priority Plugin with decay
> PriorityType=priority/multifactor
> # 2 week half-life
> PriorityDecayHalfLife=14-0
> # The larger the job, the greater its job size priority.
> PriorityFavorSmall=NO
> # The job's age factor reaches 1.0 after waiting in the
> # queue for 2 weeks.
> PriorityMaxAge=14-0
> # This next group determines the weighting of each of the
> # components of the Multi-factor Job Priority Plugin.
> # The default value for each of the following is 1.
> PriorityWeightAge=1000
> PriorityWeightFairshare=10000
> PriorityWeightJobSize=1000
> PriorityWeightPartition=1000
> PriorityWeightQOS=0 # don't use the qos factor
>
> We're running SLURM 18.08.8 on CentOS Linux 7.8.2003. The current
> slurm.conf is defaults as far as fair share is concerned:
>
> EnforcePartLimits=ALL
> GresTypes=gpu
> MpiDefault=pmix
> ProctrackType=proctrack/cgroup
> PrologFlags=x11,contain
> PropagateResourceLimitsExcept=MEMLOCK,STACK
> RebootProgram=/sbin/reboot
> ReturnToService=1
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/var/spool/slurmd
> SlurmUser=slurm
> SlurmdSyslogDebug=verbose
> StateSaveLocation=/var/spool/slurm/ctld
> SwitchType=switch/none
> TaskPlugin=task/cgroup,task/affinity
> TaskPluginParam=Sched
> HealthCheckInterval=300
> HealthCheckProgram=/usr/sbin/nhc
> InactiveLimit=0
> KillWait=30
> MinJobAge=300
> SlurmctldTimeout=120
> SlurmdTimeout=300
> Waittime=0
> DefMemPerCPU=1024
> FastSchedule=1
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> AccountingStorageHost=sched-db.lan
> AccountingStorageLoc=slurm_acct_db
> AccountingStoragePass=/var/run/munge/munge.socket.2
> AccountingStoragePort=6819
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageUser=slurm
> AccountingStoreJobComment=YES
> AccountingStorageTRES=gres/gpu
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/linux
> SlurmctldDebug=info
> SlurmdDebug=info
> SlurmSchedLogFile=/var/log/slurm/slurmsched.log
> SlurmSchedLogLevel=1
>
> Node and partition configs are omitted above.
>
> Any and all advice will be greatly appreciated.
>
> Best wishes,
>
> ~Kevin
>
> Kevin Walsh
> Senior Systems Administration Specialist
> New Jersey Institute of Technology
> Academic & Research Computing Systems
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210430/5519a4d6/attachment.htm>
More information about the slurm-users
mailing list