[slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start
Hermann Schwärzler
hermann.schwaerzler at uibk.ac.at
Wed Jul 12 08:36:10 UTC 2023
Hi Jenny,
I *guess* you have a system that has both cgroup/v1 and cgroup/v2 enabled.
Which Linux distribution are you using? And which kernel version?
What is the output of
mount | grep cgroup
What if you do not restrict the cgroup-version Slurm can use to
cgroup/v2 but omit "CgroupPlugin=..." from your cgroup.conf?
Regards,
Hermann
On 7/11/23 19:41, Williams, Jenny Avis wrote:
> Additional configuration information -- /etc/slurm/cgroup.conf
>
> CgroupAutomount=yes
>
> ConstrainCores=yes
>
> ConstrainRAMSpace=yes
>
> CgroupPlugin=cgroup/v2
>
> AllowedSwapSpace=1
>
> ConstrainSwapSpace=yes
>
> ConstrainDevices=yes
>
> *From:* Williams, Jenny Avis
> *Sent:* Tuesday, July 11, 2023 10:47 AM
> *To:* slurm-users at schedmd.com
> *Subject:* cgroupv2 + slurmd - external cgroup changes needed to get
> daemon to start
>
> Progress on getting slurmd to start under cgroupv2
>
> Issue: slurmd 22.05.6 will not start when using cgroupv2
>
> Expected result: even after reboot slurmd will start up without needing
> to manually add lines to /sys/fs/cgroup files.
>
> When started as service the error is:
>
> # systemctl status slurmd
>
> * slurmd.service - Slurm node daemon
>
> Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled;
> vendor preset: disabled)
>
> Drop-In: /etc/systemd/system/slurmd.service.d
>
> `-extendUnit.conf
>
> Active: failed (Result: exit-code) since Tue 2023-07-11 10:29:23
> EDT; 2s ago
>
> Process: 11395 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS
> (code=exited, status=1/FAILURE)
>
> Main PID: 11395 (code=exited, status=1/FAILURE)
>
> Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: Started Slurm node
> daemon.
>
> Jul 11 10:29:23 g1803jles01.ll.unc.edu slurmd[11395]: slurmd: slurmd
> version 22.05.6 started
>
> Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service: Main
> process exited, code=exited, status=1/FAILURE
>
> Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service:
> Failed with result 'exit-code'.
>
> When started at the command line the output is:
>
> # slurmd -D -vvv 2>&1 |egrep error
>
> slurmd: error: Controller cpuset is not enabled!
>
> slurmd: error: Controller cpu is not enabled!
>
> slurmd: error: Controller cpuset is not enabled!
>
> slurmd: error: Controller cpu is not enabled!
>
> slurmd: error: Controller cpuset is not enabled!
>
> slurmd: error: Controller cpu is not enabled!
>
> slurmd: error: Controller cpuset is not enabled!
>
> slurmd: error: Controller cpu is not enabled!
>
> slurmd: error: cpu cgroup controller is not available.
>
> slurmd: error: There's an issue initializing memory or cpu controller
>
> slurmd: error: Couldn't load specified plugin name for
> jobacct_gather/cgroup: Plugin init() callback failed
>
> slurmd: error: cannot create jobacct_gather context for
> jobacct_gather/cgroup
>
> Steps to mitigate the issue:
>
> While the following steps do not solve the issue, they do get the system
> in a state such that slurmd will start, at least until next reboot. The
> re-install slurm-slurmd is a one-time step to ensure that local service
> modifications are out of the picture. */Currently, even after reboot the
> cgroup echo steps are necessary at a minimum./*
>
> #!/bin/bash
>
> /usr/bin/dnf -y reinstall slurm-slurmd
>
> systemctl daemon-reload
>
> /usr/bin/pkill -f '/usr/sbin/slurmstepd infinity'
>
> systemctl enable slurmd
>
> systemctl stop dcismeng.service && \
>
> *//usr/bin/echo +cpu +cpuset +memory >>
> /sys/fs/cgroup/cgroup.subtree_control && \/*
>
> *//usr/bin/echo +cpu +cpuset +memory >>
> /sys/fs/cgroup/system.slice/cgroup.subtree_control && \/*
>
> systemctl start slurmd && \
>
> echo 'run this: systemctl start dcismeng'
>
> Environment:
>
> # scontrol show config
>
> Configuration data as of 2023-07-11T10:39:48
>
> AccountingStorageBackupHost = (null)
>
> AccountingStorageEnforce = associations,limits,qos,safe
>
> AccountingStorageHost = m1006
>
> AccountingStorageExternalHost = (null)
>
> AccountingStorageParameters = (null)
>
> AccountingStoragePort = 6819
>
> AccountingStorageTRES =
> cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu
>
> AccountingStorageType = accounting_storage/slurmdbd
>
> AccountingStorageUser = N/A
>
> AccountingStoreFlags = (null)
>
> AcctGatherEnergyType = acct_gather_energy/none
>
> AcctGatherFilesystemType = acct_gather_filesystem/none
>
> AcctGatherInterconnectType = acct_gather_interconnect/none
>
> AcctGatherNodeFreq = 0 sec
>
> AcctGatherProfileType = acct_gather_profile/none
>
> AllowSpecResourcesUsage = No
>
> AuthAltTypes = (null)
>
> AuthAltParameters = (null)
>
> AuthInfo = (null)
>
> AuthType = auth/munge
>
> BatchStartTimeout = 10 sec
>
> BcastExclude = /lib,/usr/lib,/lib64,/usr/lib64
>
> BcastParameters = (null)
>
> BOOT_TIME = 2023-07-11T10:04:31
>
> BurstBufferType = (null)
>
> CliFilterPlugins = (null)
>
> ClusterName = ASlurmCluster
>
> CommunicationParameters = (null)
>
> CompleteWait = 0 sec
>
> CoreSpecPlugin = core_spec/none
>
> CpuFreqDef = Unknown
>
> CpuFreqGovernors = OnDemand,Performance,UserSpace
>
> CredType = cred/munge
>
> DebugFlags = (null)
>
> DefMemPerNode = UNLIMITED
>
> DependencyParameters = kill_invalid_depend
>
> DisableRootJobs = No
>
> EioTimeout = 60
>
> EnforcePartLimits = ANY
>
> Epilog = (null)
>
> EpilogMsgTime = 2000 usec
>
> EpilogSlurmctld = (null)
>
> ExtSensorsType = ext_sensors/none
>
> ExtSensorsFreq = 0 sec
>
> FairShareDampeningFactor = 1
>
> FederationParameters = (null)
>
> FirstJobId = 1
>
> GetEnvTimeout = 2 sec
>
> GresTypes = gpu
>
> GpuFreqDef = high,memory=high
>
> GroupUpdateForce = 1
>
> GroupUpdateTime = 600 sec
>
> HASH_VAL = Match
>
> HealthCheckInterval = 0 sec
>
> HealthCheckNodeState = ANY
>
> HealthCheckProgram = (null)
>
> InactiveLimit = 65533 sec
>
> InteractiveStepOptions = --interactive --preserve-env --pty $SHELL
>
> JobAcctGatherFrequency = task=15
>
> JobAcctGatherType = jobacct_gather/cgroup
>
> JobAcctGatherParams = (null)
>
> JobCompHost = localhost
>
> JobCompLoc = /var/log/slurm_jobcomp.log
>
> JobCompPort = 0
>
> JobCompType = jobcomp/none
>
> JobCompUser = root
>
> JobContainerType = job_container/none
>
> JobCredentialPrivateKey = (null)
>
> JobCredentialPublicCertificate = (null)
>
> JobDefaults = (null)
>
> JobFileAppend = 0
>
> JobRequeue = 1
>
> JobSubmitPlugins = lua
>
> KillOnBadExit = 0
>
> KillWait = 30 sec
>
> LaunchParameters = (null)
>
> LaunchType = launch/slurm
>
> Licenses = mplus:1,nonmem:32
>
> LogTimeFormat = iso8601_ms
>
> MailDomain = (null)
>
> MailProg = /bin/mail
>
> MaxArraySize = 90001
>
> MaxDBDMsgs = 701360
>
> MaxJobCount = 350000
>
> MaxJobId = 67043328
>
> MaxMemPerNode = UNLIMITED
>
> MaxNodeCount = 340
>
> MaxStepCount = 40000
>
> MaxTasksPerNode = 512
>
> MCSPlugin = mcs/none
>
> MCSParameters = (null)
>
> MessageTimeout = 60 sec
>
> MinJobAge = 300 sec
>
> MpiDefault = none
>
> MpiParams = (null)
>
> NEXT_JOB_ID = 12286313
>
> NodeFeaturesPlugins = (null)
>
> OverTimeLimit = 0 min
>
> PluginDir = /usr/lib64/slurm
>
> PlugStackConfig = (null)
>
> PowerParameters = (null)
>
> PowerPlugin =
>
> PreemptMode = OFF
>
> PreemptType = preempt/none
>
> PreemptExemptTime = 00:00:00
>
> PrEpParameters = (null)
>
> PrEpPlugins = prep/script
>
> PriorityParameters = (null)
>
> PrioritySiteFactorParameters = (null)
>
> PrioritySiteFactorPlugin = (null)
>
> PriorityDecayHalfLife = 14-00:00:00
>
> PriorityCalcPeriod = 00:05:00
>
> PriorityFavorSmall = No
>
> PriorityFlags = SMALL_RELATIVE_TO_TIME,CALCULATE_RUNNING,MAX_TRES
>
> PriorityMaxAge = 60-00:00:00
>
> PriorityUsageResetPeriod = NONE
>
> PriorityType = priority/multifactor
>
> PriorityWeightAge = 10000
>
> PriorityWeightAssoc = 0
>
> PriorityWeightFairShare = 10000
>
> PriorityWeightJobSize = 1000
>
> PriorityWeightPartition = 1000
>
> PriorityWeightQOS = 1000
>
> PriorityWeightTRES = CPU=1000,Mem=4000,GRES/gpu=3000
>
> PrivateData = none
>
> ProctrackType = proctrack/cgroup
>
> Prolog = (null)
>
> PrologEpilogTimeout = 65534
>
> PrologSlurmctld = (null)
>
> PrologFlags = Alloc,Contain,X11
>
> PropagatePrioProcess = 0
>
> PropagateResourceLimits = ALL
>
> PropagateResourceLimitsExcept = (null)
>
> RebootProgram = /usr/sbin/reboot
>
> ReconfigFlags = (null)
>
> RequeueExit = (null)
>
> RequeueExitHold = (null)
>
> ResumeFailProgram = (null)
>
> ResumeProgram = (null)
>
> ResumeRate = 300 nodes/min
>
> ResumeTimeout = 60 sec
>
> ResvEpilog = (null)
>
> ResvOverRun = 0 min
>
> ResvProlog = (null)
>
> ReturnToService = 2
>
> RoutePlugin = route/default
>
> SchedulerParameters =
> batch_sched_delay=10,bf_continue,bf_max_job_part=1000,bf_max_job_test=10000,bf_max_job_user=100,bf_resolution=300,bf_window=10080,bf_yield_interval=1000000,default_queue_depth=1000,partition_job_depth=600,sched_min_interval=20000000,defer,max_rpc_cnt=80
>
> SchedulerTimeSlice = 30 sec
>
> SchedulerType = sched/backfill
>
> ScronParameters = (null)
>
> SelectType = select/cons_tres
>
> SelectTypeParameters = CR_CPU_MEMORY
>
> SlurmUser = slurm(47)
>
> SlurmctldAddr = (null)
>
> SlurmctldDebug = info
>
> SlurmctldHost[0] = ASlurmCluster-sched(x.x.x.x)
>
> SlurmctldLogFile = /data/slurm/slurmctld.log
>
> SlurmctldPort = 6820-6824
>
> SlurmctldSyslogDebug = (null)
>
> SlurmctldPrimaryOffProg = (null)
>
> SlurmctldPrimaryOnProg = (null)
>
> SlurmctldTimeout = 6000 sec
>
> SlurmctldParameters = (null)
>
> SlurmdDebug = info
>
> SlurmdLogFile = /var/log/slurm/slurmd.log
>
> SlurmdParameters = (null)
>
> SlurmdPidFile = /var/run/slurmd.pid
>
> SlurmdPort = 6818
>
> SlurmdSpoolDir = /var/spool/slurmd
>
> SlurmdSyslogDebug = (null)
>
> SlurmdTimeout = 600 sec
>
> SlurmdUser = root(0)
>
> SlurmSchedLogFile = (null)
>
> SlurmSchedLogLevel = 0
>
> SlurmctldPidFile = /var/run/slurmctld.pid
>
> SlurmctldPlugstack = (null)
>
> SLURM_CONF = /etc/slurm/slurm.conf
>
> SLURM_VERSION = 22.05.6
>
> SrunEpilog = (null)
>
> SrunPortRange = 0-0
>
> SrunProlog = (null)
>
> StateSaveLocation = /data/slurm/slurmctld
>
> SuspendExcNodes = (null)
>
> SuspendExcParts = (null)
>
> SuspendProgram = (null)
>
> SuspendRate = 60 nodes/min
>
> SuspendTime = INFINITE
>
> SuspendTimeout = 30 sec
>
> SwitchParameters = (null)
>
> SwitchType = switch/none
>
> TaskEpilog = (null)
>
> TaskPlugin = cgroup,affinity
>
> TaskPluginParam = (null type)
>
> TaskProlog = (null)
>
> TCPTimeout = 2 sec
>
> TmpFS = /tmp
>
> TopologyParam = (null)
>
> TopologyPlugin = topology/none
>
> TrackWCKey = No
>
> TreeWidth = 50
>
> UsePam = No
>
> UnkillableStepProgram = (null)
>
> UnkillableStepTimeout = 600 sec
>
> VSizeFactor = 0 percent
>
> WaitTime = 0 sec
>
> X11Parameters = home_xauthority
>
> Cgroup Support Configuration:
>
> AllowedKmemSpace = (null)
>
> AllowedRAMSpace = 100.0%
>
> AllowedSwapSpace = 1.0%
>
> CgroupAutomount = yes
>
> CgroupMountpoint = /sys/fs/cgroup
>
> CgroupPlugin = cgroup/v2
>
> ConstrainCores = yes
>
> ConstrainDevices = yes
>
> ConstrainKmemSpace = no
>
> ConstrainRAMSpace = yes
>
> ConstrainSwapSpace = yes
>
> IgnoreSystemd = no
>
> IgnoreSystemdOnFailure = no
>
> MaxKmemPercent = 100.0%
>
> MaxRAMPercent = 100.0%
>
> MaxSwapPercent = 100.0%
>
> MemorySwappiness = (null)
>
> MinKmemSpace = 30 MB
>
> MinRAMSpace = 30 MB
>
> Slurmctld(primary) at ASlurmCluster-sched is UP
>
More information about the slurm-users
mailing list