[slurm-users] How to deal with user running stuff in frontend node?

Thu Feb 15 08:29:42 MST 2018

Every cluster I've ever managed has this issue.  Once cgroup support arrived in Linux, the path we took (on CentOS 6) was to use the 'cgconfig' and 'cgred' services on the login node(s) to setup containers for regular users and quarantine them therein.  The config left 4 CPU cores unused by regular users (cpuset config), and allowed them to use up to 100% of the 16 cores granted but yield cycles as other users demand (cpu config).  The config also keeps a minor amount of RAM unused by regular users, and limits each regular user to a couple GB.

The cgrules.conf works on first-match, so at the top we make sure root and sysadmins don't have any limits.  Support staff get the overall limits for regular users, and everyone else who's not a daemon user, etc, gets a personal cgroup with the most stringent limits.

/etc/cgconfig.conf:
mount {
	cpuset	= /cgroup/cpuset;
	cpu	= /cgroup/cpu;
	#cpuacct	= /cgroup/cpuacct;
	memory	= /cgroup/memory;
	#devices	= /cgroup/devices;
	#freezer	= /cgroup/freezer;
	#net_cls	= /cgroup/net_cls;
	#blkio	= /cgroup/blkio;
}

group regular_users {
  cpu {
    cpu.shares=100;
  }
  cpuset {
    cpuset.cpus=4-19;
    cpuset.mems=0-1;
  }
  memory {
    memory.limit_in_bytes=48G;
    memory.soft_limit_in_bytes=48G;
    memory.memsw.limit_in_bytes=60G;
  }
}

template regular_users/%U {
  cpu {
    cpu.shares=100;
  }
  cpuset {
    cpuset.cpus=4-19;
    cpuset.mems=0-1;
  }
  memory {
    memory.limit_in_bytes=4G;
    memory.soft_limit_in_bytes=2G;
    memory.memsw.limit_in_bytes=6G;
  }
}

/etc/cgrules.conf
#
# Include an explicit rule for root, otherwise commands with
# the setuid bit set on them will inherit the original user's
# gid and probably wind up under @everyone:
#
root		cpuset,cpu,memory	/

#
# sysadmin
#
user1		cpuset,cpu,memory	/
user2		cpuset,cpu,memory	/

#
# sysstaff
#
user3		cpuset,cpu,memory	regular_users/
user4		cpuset,cpu,memory	regular_users/

#
# workgroups:
#
@everyone		cpuset,cpu,memory		regular_users/%U/
@group1			cpuset,cpu,memory		regular_users/%U/
@group2			cpuset,cpu,memory		regular_users/%U/
  :

> On Feb 15, 2018, at 10:11 AM, Manuel Rodríguez Pascual <manuel.rodriguez.pascual at gmail.com> wrote:
> 
> Hi all, 
> 
> Although this is not strictly related to Slurm, maybe you can recommend me some actions to deal with a particular user. 
> 
> On our small cluster, currently there are no limits to run applications in the frontend. This is sometimes really useful for some users, for example to have scripts monitoring the execution of jobs and taking decisions depending on the partial results.
> 
> However, we have this user that keeps abusing this system: when the job queue is long and there is a significant time wait, he sometimes runs his jobs on the frontend, resulting on a CPU load of 100% and some delays on using it for the things it is supposed to serve (user login, monitoring and so). 
> 
> Have you faced the same issue?  Is there any solution? I am thinking about using ulimit to limit the execution time of this jobs in the frontend to 5 minutes or so. This however does not look so elegant as other users can perform the sabe abuse on the future, and he should also be able to run low cpu-consuming jobs for a longer period. However I am not an experienced sysadmin so I am completely open to suggestions or different ways of facing this issue.
> 
> Any thoughts?
> 
> cheers, 
> 
> 
> 
> 
> Manuel

::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180215/ff7ae0bb/attachment.html>