[slurm-users] Does PrologFlags=Contain work with cgroup/v2?
Mahdi Nazemi
mnazemi at usc.edu
Fri Mar 10 22:58:43 UTC 2023
I have the following lines in my slurm.conf
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup,task/affinity
PrologFlags=Contain
and the following lines in my cgroups.conf
CgroupPlugin=cgroup/v2
CgroupAutomount=yes
ConstrainDevices=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
My small cluster consists of three servers running Slurm 22.05.8, each of which allows users to directly ssh to them. One of the servers is used as both a head node and a compute node.
When I run a job with sbatch or srun, I see the GPU constraints are enforced correctly, but all the CPU cores and memory are visible to the user.
Is PrologFlags=Contain compatible with cgroup/v2? If yes, what could be the cause of this issue?
The full contents of my slurm.conf, gres.conf, and cgroup.conf files are shown below.
ClusterName=sportlab
ControlMachine=kaveh.usc.edu
SlurmUser=root
SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/lib/slurm
SlurmdSpoolDir=/var/spool/slurm
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup,task/affinity
PrologFlags=Contain
MessageTimeout=30
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SchedulerParameters=preemptstrict_order,preempt_reorder_count=3,max_rpc_cnt=160
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
GresTypes=gpu
PreemptType=preempt/partition_prio
PreemptExemptTime=-1
PreemptMode=CANCEL
PriorityType=priority/multifactor
PriorityDecayHalfLife=30-0
PriorityFavorSmall=NO
PriorityWeightPartition=1000
PriorityWeightJobSize=1000
PriorityMaxAge=14-0
PriorityWeightQOS=1000
PropagateResourceLimitsExcept=MEMLOCK
PriorityFlags=FAIR_TREE
SlurmctldDebug=verbose
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=verbose
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
MpiDefault=none
NodeName=kaveh Gres=gpu:a6000:8 CPUs=256 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=1031870 Feature=a6000
NodeName=arvand Gres=gpu:2080ti:4 CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=128488 Feature=2080ti
NodeName=haraz Gres=gpu:1080ti:4 CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=80350 Feature=1080ti
PartitionName=gpu Nodes=kaveh,arvand,haraz Default=YES DefaultTime=12:00:00 MaxTime=UNLIMITED OverSubscribe=NO State=UP DefMemPerCPU=2048 GraceTime=10
NodeName=kaveh Name=gpu Type=a6000 File=/dev/nvidia[0-7]
NodeName=arvand Name=gpu Type=2080ti File=/dev/nvidia[0-3]
NodeName=haraz Name=gpu Type=1080ti File=/dev/nvidia[0-3]
CgroupPlugin=cgroup/v2
CgroupAutomount=yes
ConstrainDevices=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
#AllowedDevicesFile=/etc/slurm/cgroup_allowed_devices_file.conf
Thank you!
Best Regards,
Mahdi Nazemi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230310/a30b21d9/attachment-0001.htm>
More information about the slurm-users
mailing list