[slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start
Williams, Jenny Avis
jennyw at email.unc.edu
Tue Jul 11 14:47:07 UTC 2023
Progress on getting slurmd to start under cgroupv2
Issue: slurmd 22.05.6 will not start when using cgroupv2
Expected result: even after reboot slurmd will start up without needing to manually add lines to /sys/fs/cgroup files.
When started as service the error is:
# systemctl status slurmd
* slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/slurmd.service.d
`-extendUnit.conf
Active: failed (Result: exit-code) since Tue 2023-07-11 10:29:23 EDT; 2s ago
Process: 11395 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 11395 (code=exited, status=1/FAILURE)
Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: Started Slurm node daemon.
Jul 11 10:29:23 g1803jles01.ll.unc.edu slurmd[11395]: slurmd: slurmd version 22.05.6 started
Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE
Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: Failed with result 'exit-code'.
When started at the command line the output is:
# slurmd -D -vvv 2>&1 |egrep error
slurmd: error: Controller cpuset is not enabled!
slurmd: error: Controller cpu is not enabled!
slurmd: error: Controller cpuset is not enabled!
slurmd: error: Controller cpu is not enabled!
slurmd: error: Controller cpuset is not enabled!
slurmd: error: Controller cpu is not enabled!
slurmd: error: Controller cpuset is not enabled!
slurmd: error: Controller cpu is not enabled!
slurmd: error: cpu cgroup controller is not available.
slurmd: error: There's an issue initializing memory or cpu controller
slurmd: error: Couldn't load specified plugin name for jobacct_gather/cgroup: Plugin init() callback failed
slurmd: error: cannot create jobacct_gather context for jobacct_gather/cgroup
Steps to mitigate the issue:
While the following steps do not solve the issue, they do get the system in a state such that slurmd will start, at least until next reboot. The re-install slurm-slurmd is a one-time step to ensure that local service modifications are out of the picture. Currently, even after reboot the cgroup echo steps are necessary at a minimum.
#!/bin/bash
/usr/bin/dnf -y reinstall slurm-slurmd
systemctl daemon-reload
/usr/bin/pkill -f '/usr/sbin/slurmstepd infinity'
systemctl enable slurmd
systemctl stop dcismeng.service && \
/usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/cgroup.subtree_control && \
/usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/system.slice/cgroup.subtree_control && \
systemctl start slurmd && \
echo 'run this: systemctl start dcismeng'
Environment:
# scontrol show config
Configuration data as of 2023-07-11T10:39:48
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = associations,limits,qos,safe
AccountingStorageHost = m1006
AccountingStorageExternalHost = (null)
AccountingStorageParameters = (null)
AccountingStoragePort = 6819
AccountingStorageTRES = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu
AccountingStorageType = accounting_storage/slurmdbd
AccountingStorageUser = N/A
AccountingStoreFlags = (null)
AcctGatherEnergyType = acct_gather_energy/none
AcctGatherFilesystemType = acct_gather_filesystem/none
AcctGatherInterconnectType = acct_gather_interconnect/none
AcctGatherNodeFreq = 0 sec
AcctGatherProfileType = acct_gather_profile/none
AllowSpecResourcesUsage = No
AuthAltTypes = (null)
AuthAltParameters = (null)
AuthInfo = (null)
AuthType = auth/munge
BatchStartTimeout = 10 sec
BcastExclude = /lib,/usr/lib,/lib64,/usr/lib64
BcastParameters = (null)
BOOT_TIME = 2023-07-11T10:04:31
BurstBufferType = (null)
CliFilterPlugins = (null)
ClusterName = ASlurmCluster
CommunicationParameters = (null)
CompleteWait = 0 sec
CoreSpecPlugin = core_spec/none
CpuFreqDef = Unknown
CpuFreqGovernors = OnDemand,Performance,UserSpace
CredType = cred/munge
DebugFlags = (null)
DefMemPerNode = UNLIMITED
DependencyParameters = kill_invalid_depend
DisableRootJobs = No
EioTimeout = 60
EnforcePartLimits = ANY
Epilog = (null)
EpilogMsgTime = 2000 usec
EpilogSlurmctld = (null)
ExtSensorsType = ext_sensors/none
ExtSensorsFreq = 0 sec
FairShareDampeningFactor = 1
FederationParameters = (null)
FirstJobId = 1
GetEnvTimeout = 2 sec
GresTypes = gpu
GpuFreqDef = high,memory=high
GroupUpdateForce = 1
GroupUpdateTime = 600 sec
HASH_VAL = Match
HealthCheckInterval = 0 sec
HealthCheckNodeState = ANY
HealthCheckProgram = (null)
InactiveLimit = 65533 sec
InteractiveStepOptions = --interactive --preserve-env --pty $SHELL
JobAcctGatherFrequency = task=15
JobAcctGatherType = jobacct_gather/cgroup
JobAcctGatherParams = (null)
JobCompHost = localhost
JobCompLoc = /var/log/slurm_jobcomp.log
JobCompPort = 0
JobCompType = jobcomp/none
JobCompUser = root
JobContainerType = job_container/none
JobCredentialPrivateKey = (null)
JobCredentialPublicCertificate = (null)
JobDefaults = (null)
JobFileAppend = 0
JobRequeue = 1
JobSubmitPlugins = lua
KillOnBadExit = 0
KillWait = 30 sec
LaunchParameters = (null)
LaunchType = launch/slurm
Licenses = mplus:1,nonmem:32
LogTimeFormat = iso8601_ms
MailDomain = (null)
MailProg = /bin/mail
MaxArraySize = 90001
MaxDBDMsgs = 701360
MaxJobCount = 350000
MaxJobId = 67043328
MaxMemPerNode = UNLIMITED
MaxNodeCount = 340
MaxStepCount = 40000
MaxTasksPerNode = 512
MCSPlugin = mcs/none
MCSParameters = (null)
MessageTimeout = 60 sec
MinJobAge = 300 sec
MpiDefault = none
MpiParams = (null)
NEXT_JOB_ID = 12286313
NodeFeaturesPlugins = (null)
OverTimeLimit = 0 min
PluginDir = /usr/lib64/slurm
PlugStackConfig = (null)
PowerParameters = (null)
PowerPlugin =
PreemptMode = OFF
PreemptType = preempt/none
PreemptExemptTime = 00:00:00
PrEpParameters = (null)
PrEpPlugins = prep/script
PriorityParameters = (null)
PrioritySiteFactorParameters = (null)
PrioritySiteFactorPlugin = (null)
PriorityDecayHalfLife = 14-00:00:00
PriorityCalcPeriod = 00:05:00
PriorityFavorSmall = No
PriorityFlags = SMALL_RELATIVE_TO_TIME,CALCULATE_RUNNING,MAX_TRES
PriorityMaxAge = 60-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType = priority/multifactor
PriorityWeightAge = 10000
PriorityWeightAssoc = 0
PriorityWeightFairShare = 10000
PriorityWeightJobSize = 1000
PriorityWeightPartition = 1000
PriorityWeightQOS = 1000
PriorityWeightTRES = CPU=1000,Mem=4000,GRES/gpu=3000
PrivateData = none
ProctrackType = proctrack/cgroup
Prolog = (null)
PrologEpilogTimeout = 65534
PrologSlurmctld = (null)
PrologFlags = Alloc,Contain,X11
PropagatePrioProcess = 0
PropagateResourceLimits = ALL
PropagateResourceLimitsExcept = (null)
RebootProgram = /usr/sbin/reboot
ReconfigFlags = (null)
RequeueExit = (null)
RequeueExitHold = (null)
ResumeFailProgram = (null)
ResumeProgram = (null)
ResumeRate = 300 nodes/min
ResumeTimeout = 60 sec
ResvEpilog = (null)
ResvOverRun = 0 min
ResvProlog = (null)
ReturnToService = 2
RoutePlugin = route/default
SchedulerParameters = batch_sched_delay=10,bf_continue,bf_max_job_part=1000,bf_max_job_test=10000,bf_max_job_user=100,bf_resolution=300,bf_window=10080,bf_yield_interval=1000000,default_queue_depth=1000,partition_job_depth=600,sched_min_interval=20000000,defer,max_rpc_cnt=80
SchedulerTimeSlice = 30 sec
SchedulerType = sched/backfill
ScronParameters = (null)
SelectType = select/cons_tres
SelectTypeParameters = CR_CPU_MEMORY
SlurmUser = slurm(47)
SlurmctldAddr = (null)
SlurmctldDebug = info
SlurmctldHost[0] = ASlurmCluster-sched(x.x.x.x)
SlurmctldLogFile = /data/slurm/slurmctld.log
SlurmctldPort = 6820-6824
SlurmctldSyslogDebug = (null)
SlurmctldPrimaryOffProg = (null)
SlurmctldPrimaryOnProg = (null)
SlurmctldTimeout = 6000 sec
SlurmctldParameters = (null)
SlurmdDebug = info
SlurmdLogFile = /var/log/slurm/slurmd.log
SlurmdParameters = (null)
SlurmdPidFile = /var/run/slurmd.pid
SlurmdPort = 6818
SlurmdSpoolDir = /var/spool/slurmd
SlurmdSyslogDebug = (null)
SlurmdTimeout = 600 sec
SlurmdUser = root(0)
SlurmSchedLogFile = (null)
SlurmSchedLogLevel = 0
SlurmctldPidFile = /var/run/slurmctld.pid
SlurmctldPlugstack = (null)
SLURM_CONF = /etc/slurm/slurm.conf
SLURM_VERSION = 22.05.6
SrunEpilog = (null)
SrunPortRange = 0-0
SrunProlog = (null)
StateSaveLocation = /data/slurm/slurmctld
SuspendExcNodes = (null)
SuspendExcParts = (null)
SuspendProgram = (null)
SuspendRate = 60 nodes/min
SuspendTime = INFINITE
SuspendTimeout = 30 sec
SwitchParameters = (null)
SwitchType = switch/none
TaskEpilog = (null)
TaskPlugin = cgroup,affinity
TaskPluginParam = (null type)
TaskProlog = (null)
TCPTimeout = 2 sec
TmpFS = /tmp
TopologyParam = (null)
TopologyPlugin = topology/none
TrackWCKey = No
TreeWidth = 50
UsePam = No
UnkillableStepProgram = (null)
UnkillableStepTimeout = 600 sec
VSizeFactor = 0 percent
WaitTime = 0 sec
X11Parameters = home_xauthority
Cgroup Support Configuration:
AllowedKmemSpace = (null)
AllowedRAMSpace = 100.0%
AllowedSwapSpace = 1.0%
CgroupAutomount = yes
CgroupMountpoint = /sys/fs/cgroup
CgroupPlugin = cgroup/v2
ConstrainCores = yes
ConstrainDevices = yes
ConstrainKmemSpace = no
ConstrainRAMSpace = yes
ConstrainSwapSpace = yes
IgnoreSystemd = no
IgnoreSystemdOnFailure = no
MaxKmemPercent = 100.0%
MaxRAMPercent = 100.0%
MaxSwapPercent = 100.0%
MemorySwappiness = (null)
MinKmemSpace = 30 MB
MinRAMSpace = 30 MB
Slurmctld(primary) at ASlurmCluster-sched is UP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230711/5752ab84/attachment-0001.htm>
More information about the slurm-users
mailing list