<div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>been having the same issue with BCM, CentOS 8.2 BCM 9.0
Slurm 20.02.3. It seems to have started to occur when I enabled
proctrack/cgroup and changed select/linear to select/con_tres.</p></div></blockquote><div>Our slurm.conf has the same setting:</div><div>SelectType=select/cons_tres<br>SelectTypeParameters=CR_CPU<br>SchedulerTimeSlice=60<br>EnforcePartLimits=YES<br></div><div><br>We enabled MPS too. Not sure if that's relevant.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<p>Are you using cgroup process tracking and have you manipulated
the cgroup.conf file?</p></div></blockquote><div>Here's what we have in ours: </div>CgroupMountpoint="/sys/fs/cgroup"<br>CgroupAutomount=no<br>AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"<br>TaskAffinity=no<br>ConstrainCores=no<br>ConstrainRAMSpace=no<br>ConstrainSwapSpace=no<br>ConstrainDevices=no<br>ConstrainKmemSpace=yes<br>AllowedRamSpace=100<br>AllowedSwapSpace=0<br>MinKmemSpace=30<br>MaxKmemPercent=100<br>MaxRAMPercent=100<br>MaxSwapPercent=100<br>MinRAMSpace=30<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Do jobs complete correctly when not
cancelled? </blockquote><div><br></div><div>Yes they do and canceling doesn't always result in a node draining. </div><div><br></div><div>So would this be a Slurm issue or Bright? I'm telling users to add 'sleep 60' as the last line in their sbatch files.</div></div></div>