[slurm-users] Fairshare config change affect on running/queued jobs?

Fri Apr 30 16:42:33 UTC 2021

It shouldn't impact running jobs, all it should really do is impact 
pending jobs as it will order them by their relative priority scores.

-Paul Edmon-

On 4/30/2021 12:39 PM, Walsh, Kevin wrote:
> Hello everyone,
>
> We wish to deploy "fair share" scheduling configuration and would like 
> to inquire if we should be aware of effects this might have on jobs 
> already running or already queued when the config is changed.
>
> The proposed changes are from the example at 
> https://slurm.schedmd.com/archive/slurm-18.08.9/priority_multifactor.html#config 
> <https://slurm.schedmd.com/archive/slurm-18.08.9/priority_multifactor.html#config> 
> :
>
>     # Activate the Multi-factor Job Priority Plugin with decay
>     PriorityType=priority/multifactor
>     # 2 week half-life
>     PriorityDecayHalfLife=14-0
>     # The larger the job, the greater its job size priority.
>     PriorityFavorSmall=NO
>     # The job's age factor reaches 1.0 after waiting in the
>     # queue for 2 weeks.
>     PriorityMaxAge=14-0
>     # This next group determines the weighting of each of the
>     # components of the Multi-factor Job Priority Plugin.
>     # The default value for each of the following is 1.
>     PriorityWeightAge=1000
>     PriorityWeightFairshare=10000
>     PriorityWeightJobSize=1000
>     PriorityWeightPartition=1000
>     PriorityWeightQOS=0 # don't use the qos factor
>
> We're running SLURM 18.08.8 on CentOS Linux 7.8.2003. The current 
> slurm.conf is defaults as far as fair share is concerned:
>
>     EnforcePartLimits=ALL
>     GresTypes=gpu
>     MpiDefault=pmix
>     ProctrackType=proctrack/cgroup
>     PrologFlags=x11,contain
>     PropagateResourceLimitsExcept=MEMLOCK,STACK
>     RebootProgram=/sbin/reboot
>     ReturnToService=1
>     SlurmctldPidFile=/var/run/slurmctld.pid
>     SlurmctldPort=6817
>     SlurmdPidFile=/var/run/slurmd.pid
>     SlurmdPort=6818
>     SlurmdSpoolDir=/var/spool/slurmd
>     SlurmUser=slurm
>     SlurmdSyslogDebug=verbose
>     StateSaveLocation=/var/spool/slurm/ctld
>     SwitchType=switch/none
>     TaskPlugin=task/cgroup,task/affinity
>     TaskPluginParam=Sched
>     HealthCheckInterval=300
>     HealthCheckProgram=/usr/sbin/nhc
>     InactiveLimit=0
>     KillWait=30
>     MinJobAge=300
>     SlurmctldTimeout=120
>     SlurmdTimeout=300
>     Waittime=0
>     DefMemPerCPU=1024
>     FastSchedule=1
>     SchedulerType=sched/backfill
>     SelectType=select/cons_res
>     SelectTypeParameters=CR_Core_Memory
>     AccountingStorageHost=sched-db.lan
>     AccountingStorageLoc=slurm_acct_db
>     AccountingStoragePass=/var/run/munge/munge.socket.2
>     AccountingStoragePort=6819
>     AccountingStorageType=accounting_storage/slurmdbd
>     AccountingStorageUser=slurm
>     AccountingStoreJobComment=YES
>     AccountingStorageTRES=gres/gpu
>     JobAcctGatherFrequency=30
>     JobAcctGatherType=jobacct_gather/linux
>     SlurmctldDebug=info
>     SlurmdDebug=info
>     SlurmSchedLogFile=/var/log/slurm/slurmsched.log
>     SlurmSchedLogLevel=1
>
> Node and partition configs are omitted above.
>
> Any and all advice will be greatly appreciated.
>
> Best wishes,
>
> ~Kevin
>
> Kevin Walsh
> Senior Systems Administration Specialist
> New Jersey Institute of Technology
> Academic & Research Computing Systems
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210430/5519a4d6/attachment.htm>