[slurm-users] pam_slurm_adopt does not constrain memory?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Oct 25 01:13:52 MDT 2018


On 10/25/2018 07:00 AM, Christopher Samuel wrote:
> On 25/10/18 2:29 pm, Christopher Samuel wrote:
> 
>> Could explain why this isn't something we see consistently, and why 
>> we're both seeing it currently.
> 
> This seems to be a handy way to find any processes that are not properly 
> constrained by Slurm cgroups on compute nodes (at least in our 
> configuration):
> 
> ps --no-headers -eo pid,user,comm,cgroup | egrep -vw 
> 'root|freezer:/slurm.*devices:/slurm.*cpuacct,cpu:/slurm.*memory:/slurm|cpuset:/slurm.*|dbus-daemon|munged|ntpd|gmond|polkitd' 

Nice command, Chris!  I added a couple of usernames from CentOS 7 as 
seen below.  However, defunct processes seem to escape cgroups, for example:

# ps --no-headers -eo pid,user,comm,cgroup | egrep -vw 
'root|freezer:/slurm.*devices:/slurm.*cpuacct,cpu:/slurm.*memory:/slurm|cpuset:/slurm.*|dbus-daemon|munged|ntpd|gmond|polkitd|chrony|smmsp|rpcuser|rpc' 

27312 jhwa     mpiex <defunct> -

What should we do about defunct processes and cgroups?

/Ole



More information about the slurm-users mailing list