[slurm-users] What is an easy way to prevent users run programs on the, master/login node.
David Schanzenbach
davidls at hawaii.edu
Thu May 20 08:49:16 UTC 2021
For our login nodes (smallish, diskless VMs) we try and limit abuse from
users through a layered approach as enumerated below.
1. User education
Users of our cluster are required to attend a training that is run by
our group. In these sessions we do go over what we do and don't allow
on the login nodes and do stress that
we will kill long running processes if we see it and multiple abuses
could get you banned for some duration of time.
2. Set the noexec mount option for any user controlled mountpoint (home,
scratch, group/lab/project spaces)
This isn't a perfect solution, as noexec can be worked around if a user
understands what noexec means. For example, a user wouldn't be able to
do "./foo.py", but they could do "python foo.py".
We also understand some users have a legitimate reason to use a script
on the login node, but setting noexec doesn't really to prevent the use
of scripts, it just to make it a little harder for a user to abuse the
login node.
3. A small partition with shared nodes with low maxtime
For tasks that are typically longer running (compression/decompression,
compilation), outside of just user education, we as have a partition
with 4 nodes, that limit number of jobs per user (2 jobs running at a
time per user) as well as a maxtime of 4 hours. For most of our users,
this covers the cases of compilation, testing and
compression/decompression. This set of nodes are also setup to be
shared, so users are required to request number of cores and memory
required for either a batch job or interactive job to perform longer
running tasks.
4. For our software modules, we make sure to only expose the module
files so the module commands work, but do not expose the path to where
the compiled software resides.
This prevents users from loading up a module, such as a compiler, and
using it to compile code on our login nodes. If a user can't do the
abusive action to begin with, you can't really have a problem. Although,
users do sometimes ask us , why the software loaded by a module does not
work on the login node, which we then re-educate the user.
5. Make sure we don't install the development tools (gnu compilers or
jdk ) on the login nodes
As we need to allow the use of scp and other transfer tools, we can't
prevent the execution of all software in /bin. As a result, we just try
to minimize what software a user could potentially use to abuse the
login node with.
A layered approach of education and reducing the potential ways a user
can abuse our login nodes has been working for us for the past couple of
years. If we do begin to see more login node abuse, we would
probably try and layer on the use of cgroups to try and limit memory and
cpu usage.
Thanks,
David
> Date: Wed, 19 May 2021 19:00:38 +0300
> From: Alan Orth <alan.orth at gmail.com>
> To: Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>, Slurm User
> Community List <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] What is an easy way to prevent users run
> programs on the master/login node.
> Message-ID:
> <CAKKdN4U460M0mNtS=B_8QsBbpWZKZP+bQnOQDvkiH0Z_B1zUAw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Regarding setting limits for users on the head node. We had this for
> years:
>
> # CPU time in minutes
> * - cpu 30
> root - cpu unlimited
>
> But we eventually found that this was even causing long-running jobs like
> rsync/scp to fail when users were copying data to the cluster. For a while
> I blamed our network people, but then I did some tests and found that it
> was the limits that were responsible. I have removed this and other limits
> for now but I ruthlessly kill heavy processes that my users run on
> there. I
> will look into using cgroups on the head node.
>
> Cheers,
>
> On Sat, Apr 24, 2021 at 11:05 AM Ole Holm Nielsen <
> Ole.H.Nielsen at fysik.dtu.dk> wrote:
>
>> On 24-04-2021 04:37, Crist?bal Navarro wrote:
>>> Hi Community,
>>> I have a set of users still not so familiar with slurm, and yesterday
>>> they bypassed srun/sbatch and just ran their CPU program directly on the
>>> head/login node thinking it would still run on the compute node. I am
>>> aware that I will need to teach them some basic usage, but in the
>>> meanwhile, how have you solved this type of user-behavior problem? Is
>>> there a preffered way to restrict the master/login resources, or
>>> actions, to the regular users ?
>> We restrict user limits in /etc/security/limits.conf so users can't run
>> very long or very big tasks on the login nodes:
>>
>> # Normal user limits
>> * hard cpu 20
>> * hard rss 50000000
>> * hard data 50000000
>> * soft stack 40000000
>> * hard stack 50000000
>> * hard nproc 250
>>
>> /Ole
>>
>>
More information about the slurm-users
mailing list