[slurm-users] Slurm Multi-cluster implementation

Yair Yarom irush at cs.huji.ac.il
Mon Nov 1 10:35:36 UTC 2021


cpu limit using ulimit is pretty straightforward with pam_limits and
/etc/security/limits.conf. On some of the login nodes we have a cpu limit
of 10 minutes, so heavy processes will fail.

The memory was a bit more complicated (i.e. not pretty). We wanted that a
user won't be able to use more than e.g. 1G for all processes combined.
Using systemd we added the file
/etc/systemd/system/user-.slice.d/20-memory.conf which contains:
[Slice]
MemoryLimit=1024M
MemoryAccounting=true

But we also wanted to restrict swap usage and we're still on cgroupv1, so
systemd didn't help there. The ugly part comes with a pam_exec to a script
that updates the memsw limit of the cgroup for the above slice. The script
does more things, but the swap section is more or less:

if [ "x$PAM_TYPE" = 'xopen_session' ]; then
    _id=`id -u $PAM_USER`
    if [ -z "$_id" ]; then
        exit 1
    fi
    if [[ -e
/sys/fs/cgroup/memory/user.slice/user-${_id}.slice/memory.memsw.limit_in_bytes
]]; then
        swap=$((1126 * 1024 * 1024))
        echo $swap >
/sys/fs/cgroup/memory/user.slice/user-${_id}.slice/memory.memsw.limit_in_bytes
    fi
fi


On Sun, Oct 31, 2021 at 6:36 PM Brian Andrus <toomuchit at gmail.com> wrote:

> That is interesting to me.
>
> How do you use ulimit and systemd to limit user usage on the login nodes?
> This sounds like something very useful.
>
> Brian Andrus
> On 10/31/2021 1:08 AM, Yair Yarom wrote:
>
> Hi,
>
> If it helps, this is our setup:
> 6 clusters (actually a bit more)
> 1 mysql + slurmdbd on the same host
> 6 primary slurmctld on 3 hosts (need to make sure each have a distinct
> SlurmctldPort)
> 6 secondary slurmctld on an arbitrary node on the clusters themselves.
> 1 login node per cluster (this is a very small VM, and the users are
> limited both to cpu time (with ulimit) and memory (with systemd))
> The slurm.conf's are shared on nfs to everyone in /path/to/nfs/<cluster
> name>/slurm.conf. With symlink to /etc for the relevant cluster per node.
>
> The -M generally works, we can submit/query jobs from a login node of one
> cluster to another. But there's a caveat to notice when upgrading. slurmdbd
> must be upgraded first, but usually we have a not so small gap between
> upgrading the different clusters. This causes the -M to stop working
> because binaries of one version won't work on the other (I don't remember
> in which direction).
> We solved this by using an lmod module per cluster, which both sets the
> SLURM_CONF environment, and the PATH to the correct slurm binaries (which
> we install in /usr/local/slurm/<version>/ so that they co-exists). So when
> the -M won't work, users can use:
> module load slurm/clusterA
> squeue
> module load slurm/clusterB
> squeue
>
> BR,
>
>
>
>
>
>
>
> On Thu, Oct 28, 2021 at 7:39 PM navin srivastava <navin.altair at gmail.com>
> wrote:
>
>> Thank you Tina.
>> It will really help
>>
>> Regards
>> Navin
>>
>> On Thu, Oct 28, 2021, 22:01 Tina Friedrich <tina.friedrich at it.ox.ac.uk>
>> wrote:
>>
>>> Hello,
>>>
>>> I have the database on a separate server (it runs the database and the
>>> database only). The login nodes run nothing SLURM related, they simply
>>> have the binaries installed & a SLURM config.
>>>
>>> I've never looked into having multiple databases & using
>>> AccountingStorageExternalHost (in fact I'd forgotten you could do that),
>>> so I can't comment on that (maybe someone else can); I think that works,
>>> yes, but as I said never tested that (didn't see much point in running
>>> multiple databases if one would do the job).
>>>
>>> I actually have specific login nodes for both of my clusters, to make it
>>> easier for users (especially those with not much experience using the
>>> HPC environment); so I have one login node connecting to cluster 1 and
>>> one connecting to cluster 1.
>>>
>>> I think the relevant bits of slurm.conf Relevant config entries (if I'm
>>> not mistaken) on the login nodes are probably:
>>>
>>> The differences in the slurm config files (that haven't got to do with
>>> topology & nodes & scheduler tuning) are
>>>
>>> ClusterName=cluster1
>>> ControlMachine=cluster1-slurm
>>> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>>>
>>> ClusterName=cluster2
>>> ControlMachine=cluster2-slurm
>>> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>>>
>>> (where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm,
>>> same for cluster2)
>>>
>>> And then the have common entries for the AccountingStorageHost:
>>>
>>> AccountingStorageHost=slurm-db-prod
>>> AccountingStorageBackupHost=slurm-db-prod
>>> AccountingStoragePort=7030
>>> AccountingStorageType=accounting_storage/slurmdbd
>>>
>>> (slurm-db-prod is simply the hostname of the SLURM database server)
>>>
>>> Does that help?
>>>
>>> Tina
>>>
>>> On 28/10/2021 14:59, navin srivastava wrote:
>>> > Thank you Tina.
>>> >
>>> > so if i understood correctly.Database is global to both cluster and
>>> > running on login Node?
>>> > or is the database running on one of the master Node and shared with
>>> > another master server Node?
>>> >
>>> > but as far I have read that the slurm database can also be separate on
>>> > both the master and just use the parameter
>>> > AccountingStorageExternalHost so that both databases are aware of each
>>> > other.
>>> >
>>> > Also on the login node in slurm .conf file pointed to which Slurmctld?
>>> > is it possible to share the  sample slurm.conf file of login Node.
>>> >
>>> > Regards
>>> > Navin.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich
>>> > <tina.friedrich at it.ox.ac.uk <mailto:tina.friedrich at it.ox.ac.uk>>
>>> wrote:
>>> >
>>> >     Hi Navin,
>>> >
>>> >     well, I have two clusters & login nodes that allow access to both.
>>> That
>>> >     do? I don't think a third would make any difference in setup.
>>> >
>>> >     They need to share a database. As long as the share a database, the
>>> >     clusters have 'knowledge' of each other.
>>> >
>>> >     So if you set up one database server (running slurmdbd), and then a
>>> >     SLURM controller for each cluster (running slurmctld) using that
>>> one
>>> >     central database, the '-M' option should work.
>>> >
>>> >     Tina
>>> >
>>> >     On 28/10/2021 10:54, navin srivastava wrote:
>>> >      > Hi ,
>>> >      >
>>> >      > I am looking for a stepwise guide to setup multi cluster
>>> >     implementation.
>>> >      > We wanted to set up 3 clusters and one Login Node to run the job
>>> >     using
>>> >      > -M cluster option.
>>> >      > can anybody have such a setup and can share some insight into
>>> how it
>>> >      > works and it is really a stable solution.
>>> >      >
>>> >      >
>>> >      > Regards
>>> >      > Navin.
>>> >
>>> >     --
>>> >     Tina Friedrich, Advanced Research Computing Snr HPC Systems
>>> >     Administrator
>>> >
>>> >     Research Computing and Support Services
>>> >     IT Services, University of Oxford
>>> >     http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
>>> >     http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
>>> >
>>>
>>> --
>>> Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
>>>
>>> Research Computing and Support Services
>>> IT Services, University of Oxford
>>> http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
>>>
>>>
>
> --
>
>   /|       |
>   \/       | Yair Yarom | System Group (DevOps)
>   []       | The Rachel and Selim Benin School
>   [] /\    | of Computer Science and Engineering
>   []//\\/  | The Hebrew University of Jerusalem
>   [//  \\  | T +972-2-5494522 | F +972-2-5494522
>   //    \  | irush at cs.huji.ac.il
>  //        |
>
>

-- 

  /|       |
  \/       | Yair Yarom | System Group (DevOps)
  []       | The Rachel and Selim Benin School
  [] /\    | of Computer Science and Engineering
  []//\\/  | The Hebrew University of Jerusalem
  [//  \\  | T +972-2-5494522 | F +972-2-5494522
  //    \  | irush at cs.huji.ac.il
 //        |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211101/9b4fd37f/attachment-0001.htm>


More information about the slurm-users mailing list