[slurm-users] Single user consuming all resources of the cluster

Matteo F mfasco984 at gmail.com
Tue Feb 6 03:40:17 MST 2018


Hello there.

I've just set up a small Slurm cluster for our on-premise computation needs
(nothing too exotic, just a bunch of R scripts).

The systems "works" if the sense that users are able to submit jobs, but I
have an issue with resources management: a single user can consume all
resources of the cluster.

I will attach data of my live system so you can watch the config files and
my troubleshooting attempts, but I present here a simplified practical
example:

Suppose I have 2x nodes with 10G of RAM each. User1 submits 4 jobs, each
one requiring 5G. He fills the cluster.
Them comes User2 and submit another Job, which gets queued until one of
User1's Job completes (which may require days). This is not good.
I've tried to limit the number of running job using Qos ->
MaxJobsPerAccount, but this wouldn't stop a user to just fill up the
cluster with fewer (but bigger) jobs.

How can I avoid that?

Here is a link to my config files: https://pastebin.com/iwAnBMpY

Thanks a lot.
Matteo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180206/24895ab2/attachment.html>


More information about the slurm-users mailing list