[slurm-users] Slurm Multi-cluster implementation
Yair Yarom
irush at cs.huji.ac.il
Mon Nov 1 10:35:36 UTC 2021
cpu limit using ulimit is pretty straightforward with pam_limits and
/etc/security/limits.conf. On some of the login nodes we have a cpu limit
of 10 minutes, so heavy processes will fail.
The memory was a bit more complicated (i.e. not pretty). We wanted that a
user won't be able to use more than e.g. 1G for all processes combined.
Using systemd we added the file
/etc/systemd/system/user-.slice.d/20-memory.conf which contains:
[Slice]
MemoryLimit=1024M
MemoryAccounting=true
But we also wanted to restrict swap usage and we're still on cgroupv1, so
systemd didn't help there. The ugly part comes with a pam_exec to a script
that updates the memsw limit of the cgroup for the above slice. The script
does more things, but the swap section is more or less:
if [ "x$PAM_TYPE" = 'xopen_session' ]; then
_id=`id -u $PAM_USER`
if [ -z "$_id" ]; then
exit 1
fi
if [[ -e
/sys/fs/cgroup/memory/user.slice/user-${_id}.slice/memory.memsw.limit_in_bytes
]]; then
swap=$((1126 * 1024 * 1024))
echo $swap >
/sys/fs/cgroup/memory/user.slice/user-${_id}.slice/memory.memsw.limit_in_bytes
fi
fi
On Sun, Oct 31, 2021 at 6:36 PM Brian Andrus <toomuchit at gmail.com> wrote:
> That is interesting to me.
>
> How do you use ulimit and systemd to limit user usage on the login nodes?
> This sounds like something very useful.
>
> Brian Andrus
> On 10/31/2021 1:08 AM, Yair Yarom wrote:
>
> Hi,
>
> If it helps, this is our setup:
> 6 clusters (actually a bit more)
> 1 mysql + slurmdbd on the same host
> 6 primary slurmctld on 3 hosts (need to make sure each have a distinct
> SlurmctldPort)
> 6 secondary slurmctld on an arbitrary node on the clusters themselves.
> 1 login node per cluster (this is a very small VM, and the users are
> limited both to cpu time (with ulimit) and memory (with systemd))
> The slurm.conf's are shared on nfs to everyone in /path/to/nfs/<cluster
> name>/slurm.conf. With symlink to /etc for the relevant cluster per node.
>
> The -M generally works, we can submit/query jobs from a login node of one
> cluster to another. But there's a caveat to notice when upgrading. slurmdbd
> must be upgraded first, but usually we have a not so small gap between
> upgrading the different clusters. This causes the -M to stop working
> because binaries of one version won't work on the other (I don't remember
> in which direction).
> We solved this by using an lmod module per cluster, which both sets the
> SLURM_CONF environment, and the PATH to the correct slurm binaries (which
> we install in /usr/local/slurm/<version>/ so that they co-exists). So when
> the -M won't work, users can use:
> module load slurm/clusterA
> squeue
> module load slurm/clusterB
> squeue
>
> BR,
>
>
>
>
>
>
>
> On Thu, Oct 28, 2021 at 7:39 PM navin srivastava <navin.altair at gmail.com>
> wrote:
>
>> Thank you Tina.
>> It will really help
>>
>> Regards
>> Navin
>>
>> On Thu, Oct 28, 2021, 22:01 Tina Friedrich <tina.friedrich at it.ox.ac.uk>
>> wrote:
>>
>>> Hello,
>>>
>>> I have the database on a separate server (it runs the database and the
>>> database only). The login nodes run nothing SLURM related, they simply
>>> have the binaries installed & a SLURM config.
>>>
>>> I've never looked into having multiple databases & using
>>> AccountingStorageExternalHost (in fact I'd forgotten you could do that),
>>> so I can't comment on that (maybe someone else can); I think that works,
>>> yes, but as I said never tested that (didn't see much point in running
>>> multiple databases if one would do the job).
>>>
>>> I actually have specific login nodes for both of my clusters, to make it
>>> easier for users (especially those with not much experience using the
>>> HPC environment); so I have one login node connecting to cluster 1 and
>>> one connecting to cluster 1.
>>>
>>> I think the relevant bits of slurm.conf Relevant config entries (if I'm
>>> not mistaken) on the login nodes are probably:
>>>
>>> The differences in the slurm config files (that haven't got to do with
>>> topology & nodes & scheduler tuning) are
>>>
>>> ClusterName=cluster1
>>> ControlMachine=cluster1-slurm
>>> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>>>
>>> ClusterName=cluster2
>>> ControlMachine=cluster2-slurm
>>> ControlAddr=/IP_OF_SLURM_CONTROLLER/
>>>
>>> (where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm,
>>> same for cluster2)
>>>
>>> And then the have common entries for the AccountingStorageHost:
>>>
>>> AccountingStorageHost=slurm-db-prod
>>> AccountingStorageBackupHost=slurm-db-prod
>>> AccountingStoragePort=7030
>>> AccountingStorageType=accounting_storage/slurmdbd
>>>
>>> (slurm-db-prod is simply the hostname of the SLURM database server)
>>>
>>> Does that help?
>>>
>>> Tina
>>>
>>> On 28/10/2021 14:59, navin srivastava wrote:
>>> > Thank you Tina.
>>> >
>>> > so if i understood correctly.Database is global to both cluster and
>>> > running on login Node?
>>> > or is the database running on one of the master Node and shared with
>>> > another master server Node?
>>> >
>>> > but as far I have read that the slurm database can also be separate on
>>> > both the master and just use the parameter
>>> > AccountingStorageExternalHost so that both databases are aware of each
>>> > other.
>>> >
>>> > Also on the login node in slurm .conf file pointed to which Slurmctld?
>>> > is it possible to share the sample slurm.conf file of login Node.
>>> >
>>> > Regards
>>> > Navin.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich
>>> > <tina.friedrich at it.ox.ac.uk <mailto:tina.friedrich at it.ox.ac.uk>>
>>> wrote:
>>> >
>>> > Hi Navin,
>>> >
>>> > well, I have two clusters & login nodes that allow access to both.
>>> That
>>> > do? I don't think a third would make any difference in setup.
>>> >
>>> > They need to share a database. As long as the share a database, the
>>> > clusters have 'knowledge' of each other.
>>> >
>>> > So if you set up one database server (running slurmdbd), and then a
>>> > SLURM controller for each cluster (running slurmctld) using that
>>> one
>>> > central database, the '-M' option should work.
>>> >
>>> > Tina
>>> >
>>> > On 28/10/2021 10:54, navin srivastava wrote:
>>> > > Hi ,
>>> > >
>>> > > I am looking for a stepwise guide to setup multi cluster
>>> > implementation.
>>> > > We wanted to set up 3 clusters and one Login Node to run the job
>>> > using
>>> > > -M cluster option.
>>> > > can anybody have such a setup and can share some insight into
>>> how it
>>> > > works and it is really a stable solution.
>>> > >
>>> > >
>>> > > Regards
>>> > > Navin.
>>> >
>>> > --
>>> > Tina Friedrich, Advanced Research Computing Snr HPC Systems
>>> > Administrator
>>> >
>>> > Research Computing and Support Services
>>> > IT Services, University of Oxford
>>> > http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
>>> > http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
>>> >
>>>
>>> --
>>> Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
>>>
>>> Research Computing and Support Services
>>> IT Services, University of Oxford
>>> http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
>>>
>>>
>
> --
>
> /| |
> \/ | Yair Yarom | System Group (DevOps)
> [] | The Rachel and Selim Benin School
> [] /\ | of Computer Science and Engineering
> []//\\/ | The Hebrew University of Jerusalem
> [// \\ | T +972-2-5494522 | F +972-2-5494522
> // \ | irush at cs.huji.ac.il
> // |
>
>
--
/| |
\/ | Yair Yarom | System Group (DevOps)
[] | The Rachel and Selim Benin School
[] /\ | of Computer Science and Engineering
[]//\\/ | The Hebrew University of Jerusalem
[// \\ | T +972-2-5494522 | F +972-2-5494522
// \ | irush at cs.huji.ac.il
// |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211101/9b4fd37f/attachment-0001.htm>
More information about the slurm-users
mailing list