[slurm-users] What is an easy way to prevent users run programs on the, master/login node.

David Schanzenbach davidls at hawaii.edu
Thu May 20 08:49:16 UTC 2021


For our login nodes (smallish, diskless VMs) we try and limit abuse from 
users through a layered approach as enumerated below.

1. User education

Users of our cluster are required to attend a training that is run by 
our group.  In these sessions we do  go over what we do and don't allow 
on the login nodes and do stress that
we will kill long running processes if we see it and multiple abuses 
could get you banned for some duration of time.

2. Set the noexec mount option for any user controlled mountpoint (home, 
scratch, group/lab/project spaces)

This isn't a perfect solution, as noexec can be worked around if a user 
understands what noexec means.  For example, a user wouldn't be able to 
do "./foo.py", but they could do "python foo.py".
We also understand some users have a legitimate reason to use a script 
on the login node, but setting noexec doesn't really to prevent the use 
of scripts, it just to make it a little harder for a user to abuse the 
login node.

3. A small partition with shared nodes with low maxtime

For tasks that are typically longer running (compression/decompression, 
compilation), outside of just user education, we as have a partition 
with 4 nodes, that limit number of jobs per user (2 jobs running at a 
time per user) as well as a maxtime of 4 hours.  For most of our users, 
this covers the cases of compilation, testing and 
compression/decompression.  This set of nodes are also setup to be 
shared, so users are required to request number of cores and memory 
required for either a batch job or interactive job to perform longer 
running tasks.

4. For our software modules, we make sure to only expose the module 
files so the module commands work, but do not expose the path to where 
the compiled software resides.

This prevents users from loading up a module, such as a compiler, and 
using it to compile code on our login nodes.  If a user can't do the 
abusive action to begin with, you can't really have a problem. Although, 
users do sometimes ask us , why the software loaded by a module does not 
work on the login node, which we then re-educate the user.

5. Make sure we don't install the development tools (gnu compilers or 
jdk ) on the login nodes

As we need to allow the use of scp and other transfer tools, we can't 
prevent the execution of all software in /bin.  As a result, we just try 
to minimize what software a user could potentially use to abuse the 
login node with.


A layered approach of education and reducing the potential ways a user 
can abuse our login nodes has been working for us for the past couple of 
years.  If we do begin to see more login node abuse, we would
probably try and layer on the use of cgroups to try and limit memory and 
cpu usage.


Thanks,
David

> Date: Wed, 19 May 2021 19:00:38 +0300
> From: Alan Orth <alan.orth at gmail.com>
> To: Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>, Slurm User
> Community List <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] What is an easy way to prevent users run
> programs on the master/login node.
> Message-ID:
> <CAKKdN4U460M0mNtS=B_8QsBbpWZKZP+bQnOQDvkiH0Z_B1zUAw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Regarding setting limits for users on the head node. We had this for 
> years:
>
> # CPU time in minutes
> * - cpu 30
> root - cpu unlimited
>
> But we eventually found that this was even causing long-running jobs like
> rsync/scp to fail when users were copying data to the cluster. For a while
> I blamed our network people, but then I did some tests and found that it
> was the limits that were responsible. I have removed this and other limits
> for now but I ruthlessly kill heavy processes that my users run on 
> there. I
> will look into using cgroups on the head node.
>
> Cheers,
>
> On Sat, Apr 24, 2021 at 11:05 AM Ole Holm Nielsen <
> Ole.H.Nielsen at fysik.dtu.dk> wrote:
>
>> On 24-04-2021 04:37, Crist?bal Navarro wrote:
>>> Hi Community,
>>> I have a set of users still not so familiar with slurm, and yesterday
>>> they bypassed srun/sbatch and just ran their CPU program directly on the
>>> head/login node thinking it would still run on the compute node. I am
>>> aware that I will need to teach them some basic usage, but in the
>>> meanwhile, how have you solved this type of user-behavior problem? Is
>>> there a preffered way to restrict the master/login resources, or
>>> actions, to the regular users ?
>> We restrict user limits in /etc/security/limits.conf so users can't run
>> very long or very big tasks on the login nodes:
>>
>> # Normal user limits
>> * hard cpu 20
>> * hard rss 50000000
>> * hard data 50000000
>> * soft stack 40000000
>> * hard stack 50000000
>> * hard nproc 250
>>
>> /Ole
>>
>>





More information about the slurm-users mailing list