[slurm-users] How to enable QOS correctly?
Matthew BETTINGER
matthew.bettinger at external.total.com
Tue Mar 5 17:29:19 UTC 2019
So here is a default partition
PartitionName=BDW
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=nid00[016-063]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=3456 TotalNodes=48 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
If we just flip on AccountingStorageEnforce=limits,qos,(tried adding safe as well) no jobs can run.
Here is a running job which shows the default "normal" QOS that was created when slurm was installed
JobId=244667 JobName=em25d_SEAM
UserId=j0497482(10214) GroupId=rt3(501) MCS_label=N/A
Priority=1 Nice=0 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:06 TimeLimit=1-00:00:00 TimeMin=N/A
SubmitTime=2019-03-05T11:24:41 EligibleTime=2019-03-05T11:24:41
StartTime=2019-03-05T11:24:41 EndTime=2019-03-06T11:24:41 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=KNL AllocNode:Sid=hickory-1:4991
ReqNodeList=(null) ExcNodeList=(null)
NodeList=nid00605
BatchHost=nid00605
NumNodes=1 NumCPUs=256 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:1
TRES=cpu=256,mem=96763M,node=1,gres/craynetwork=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=96763M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=craynetwork:1 Reservation=(null)
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
Command=./.prg29913/tmp/DIVAcmdEXEC29913.py None /home/j0497482/bin/em25d_SEAM mode=forward model=mod1.h5 model_H=none i_bwc=0 flist=0.15,0.25,0.5,1.0 verbose=5 sabs=1 rabs=1 acqui_file=acq1 nky=20 ofile=out_forward.nc minOff=2000.0,2000.0,2000.0,2000.0 maxOff=10000.0,10000.0,10000.0,10000.0 NoiseEx=1.0e-14,1.0e-14,1.0e-14,1.0e-14 bedThreshold=2
WorkDir=/data/gpfs/Users/j0497482/data/EM_data/Model46
StdErr=/data/gpfs/Users/j0497482/data/EM_data/Model46/./logs/Job_Standalone_244667.slurm_err
StdIn=/dev/null
StdOut=/data/gpfs/Users/j0497482/data/EM_data/Model46/./logs/Job_Standalone_244667.slurm_log
Power=
sacctmgr show qos normal
normal 0 00:00:00 cluster 1.000000
On 3/5/19, 10:47 AM, "slurm-users on behalf of Michael Gutteridge" <slurm-users-bounces at lists.schedmd.com on behalf of michael.gutteridge at gmail.com> wrote:
Hi
It might be useful to see the configuration of the partition and how the QOS is set up... but at first blush I suspect you may need to set OverPartQOS (https://slurm.schedmd.com/resource_limits.html)
to get the QOS limit to take precedence over the limit in the partition. However, the "reason" should be different if that were the case.
Look at that, maybe send the QOS and partition config.
- Michael
On Tue, Mar 5, 2019 at 7:40 AM Matthew BETTINGER <matthew.bettinger at external.total.com> wrote:
Hey slurm gurus. We have been trying to enable slurm QOS on a cray system here off and on for quite a while but can never get it working. Every time we try to enable QOS we disrupt the cluster and users and have to fall back. I'm not sure what we are doing
wrong. We run a pretty open system here since we are a research group but there are time where we need to let a user run a job to exceed a partition limit. In lieu of using QOs the only other way we have figured out how to do this is create a new partition
and push out the modified slurm.conf. It's a hassle.
I'm not sure what information is needed exactly to troubleshoot this but I understand to enable QOS we need to enable this line in slurm.conf
AccountingStorageEnforce=limits,qos
Every time we attempt this no one can submit a job, slurm says waiting on resources I believe.
We have accounting enabled and everyone is a member of the default qos group "normal".
Configuration data as of 2019-03-05T09:36:19
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = none
AccountingStorageHost = hickory-1
AccountingStorageLoc = N/A
AccountingStoragePort = 6819
AccountingStorageTRES = cpu,mem,energy,node,bb/cray,gres/craynetwork,gres/gpu
AccountingStorageType = accounting_storage/slurmdbd
AccountingStorageUser = N/A
AccountingStoreJobComment = Yes
AcctGatherEnergyType = acct_gather_energy/rapl
AcctGatherFilesystemType = acct_gather_filesystem/none
AcctGatherInfinibandType = acct_gather_infiniband/none
AcctGatherNodeFreq = 30 sec
AcctGatherProfileType = acct_gather_profile/none
AllowSpecResourcesUsage = 1
AuthInfo = (null)
AuthType = auth/munge
BackupAddr = hickory-2
BackupController = hickory-2
BatchStartTimeout = 10 sec
BOOT_TIME = 2019-03-04T16:11:55
BurstBufferType = burst_buffer/cray
CacheGroups = 0
CheckpointType = checkpoint/none
ChosLoc = (null)
ClusterName = hickory
CompleteWait = 0 sec
ControlAddr = hickory-1
ControlMachine = hickory-1
CoreSpecPlugin = cray
CpuFreqDef = Performance
CpuFreqGovernors = Performance,OnDemand
CryptoType = crypto/munge
DebugFlags = (null)
DefMemPerNode = UNLIMITED
DisableRootJobs = No
EioTimeout = 60
EnforcePartLimits = NO
Epilog = (null)
EpilogMsgTime = 2000 usec
EpilogSlurmctld = (null)
ExtSensorsType = ext_sensors/none
ExtSensorsFreq = 0 sec
FairShareDampeningFactor = 1
FastSchedule = 0
FirstJobId = 1
GetEnvTimeout = 2 sec
GresTypes = gpu,craynetwork
GroupUpdateForce = 1
GroupUpdateTime = 600 sec
HASH_VAL = Match
HealthCheckInterval = 0 sec
HealthCheckNodeState = ANY
HealthCheckProgram = (null)
InactiveLimit = 0 sec
JobAcctGatherFrequency = 30
JobAcctGatherType = jobacct_gather/linux
JobAcctGatherParams = (null)
JobCheckpointDir = /var/slurm/checkpoint
JobCompHost = localhost
JobCompLoc = /var/log/slurm_jobcomp.log
JobCompPort = 0
JobCompType = jobcomp/none
JobCompUser = root
JobContainerType = job_container/cncu
JobCredentialPrivateKey = (null)
JobCredentialPublicCertificate = (null)
JobFileAppend = 0
JobRequeue = 1
JobSubmitPlugins = cray
KeepAliveTime = SYSTEM_DEFAULT
KillOnBadExit = 1
KillWait = 30 sec
LaunchParameters = (null)
LaunchType = launch/slurm
Layouts =
Licenses = (null)
LicensesUsed = (null)
MailDomain = (null)
MailProg = /bin/mail
MaxArraySize = 1001
MaxJobCount = 10000
MaxJobId = 67043328
MaxMemPerCPU = 128450
MaxStepCount = 40000
MaxTasksPerNode = 512
MCSPlugin = mcs/none
MCSParameters = (null)
MemLimitEnforce = Yes
MessageTimeout = 10 sec
MinJobAge = 300 sec
MpiDefault = none
MpiParams = ports=20000-32767
MsgAggregationParams = (null)
NEXT_JOB_ID = 244342
NodeFeaturesPlugins = (null)
OverTimeLimit = 0 min
PluginDir = /opt/slurm/17.02.6/lib64/slurm
PlugStackConfig = /etc/opt/slurm/plugstack.conf
PowerParameters = (null)
PowerPlugin =
PreemptMode = OFF
PreemptType = preempt/none
PriorityParameters = (null)
PriorityDecayHalfLife = 7-00:00:00
PriorityCalcPeriod = 00:05:00
PriorityFavorSmall = No
PriorityFlags =
PriorityMaxAge = 7-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType = priority/multifactor
PriorityWeightAge = 0
PriorityWeightFairShare = 0
PriorityWeightJobSize = 0
PriorityWeightPartition = 0
PriorityWeightQOS = 0
PriorityWeightTRES = (null)
PrivateData = none
ProctrackType = proctrack/cray
Prolog = (null)
PrologEpilogTimeout = 65534
PrologSlurmctld = (null)
PrologFlags = (null)
PropagatePrioProcess = 0
PropagateResourceLimits = (null)
PropagateResourceLimitsExcept = AS
RebootProgram = (null)
ReconfigFlags = (null)
RequeueExit = (null)
RequeueExitHold = (null)
ResumeProgram = (null)
ResumeRate = 300 nodes/min
ResumeTimeout = 60 sec
ResvEpilog = (null)
ResvOverRun = 0 min
ResvProlog = (null)
ReturnToService = 2
RoutePlugin = route/default
SallocDefaultCommand = (null)
SbcastParameters = (null)
SchedulerParameters = (null)
SchedulerTimeSlice = 30 sec
SchedulerType = sched/backfill
SelectType = select/cray
SelectTypeParameters = CR_CORE_MEMORY,OTHER_CONS_RES,NHC_ABSOLUTELY_NO
SlurmUser = root(0)
SlurmctldDebug = info
SlurmctldLogFile = /var/spool/slurm/slurmctld.log
SlurmctldPort = 6817
SlurmctldTimeout = 120 sec
SlurmdDebug = info
SlurmdLogFile = /var/spool/slurmd/%h.log
SlurmdPidFile = /var/spool/slurmd/slurmd.pid
SlurmdPlugstack = (null)
SlurmdPort = 6818
SlurmdSpoolDir = /var/spool/slurmd
SlurmdTimeout = 300 sec
SlurmdUser = root(0)
SlurmSchedLogFile = (null)
SlurmSchedLogLevel = 0
SlurmctldPidFile = /var/spool/slurm/slurmctld.pid
SlurmctldPlugstack = (null)
SLURM_CONF = /etc/opt/slurm/slurm.conf
SLURM_VERSION = 17.02.6
SrunEpilog = (null)
SrunPortRange = 0-0
SrunProlog = (null)
StateSaveLocation = /apps/cluster/hickory/slurm/
SuspendExcNodes = (null)
SuspendExcParts = (null)
SuspendProgram = (null)
SuspendRate = 60 nodes/min
SuspendTime = NONE
SuspendTimeout = 30 sec
SwitchType = switch/cray
TaskEpilog = (null)
TaskPlugin = task/cray,task/affinity,task/cgroup
TaskPluginParam = (null type)
TaskProlog = (null)
TCPTimeout = 2 sec
TmpFS = /tmp
TopologyParam = (null)
TopologyPlugin = topology/none
TrackWCKey = No
TreeWidth = 50
UsePam = 0
UnkillableStepProgram = (null)
UnkillableStepTimeout = 60 sec
VSizeFactor = 0 percent
WaitTime = 0 sec
Slurmctld(primary/backup) at hickory-1/hickory-2 are UP/UP
More information about the slurm-users
mailing list