[slurm-users] Is anyone running the slurmctld and slurmdbd services from within a container?

Thu Jul 1 23:47:19 UTC 2021

Hi,

we tried it out it on Google Cloud with GPU nodes running on another provider through site-to-site VPN. The database was on a managed GCloud instance.
There are indeed points that you need to consider:
- Microservice: the maximalist dream "1 process = 1 container" is not possible for slurmctld/dbd if you need munge - you'll need an entrypoint script which first starts munge and then the server component. This script is executed as the user defined at image build (or in Kubernetes deployment settings) and it has either to be root or have permission to start the other services (check entrypoint example at the end)
- User definitions: if you define accounts locally, it can be cumbersome to have to rebuild images on every modification. A hosted ID provider like a LDAP helps; it adds another process to start in the entrypoint (the SSSD part in the example below)
- Kubernetes networking: pods are in their own IP space controlled by the K8s scheduler and the common way of exposing resources is through ingress controllers which work in conjunction with a load balancer bound to a static (public) IP. This obviously impacts how you define your SlumctldHost and AccoutingStorageHost entries and it also adds inevitable communication hops. Inside a K8s cluster, services may reach each other through the internal DNS provided out of the box, we further deployed a cluster-wide DNS to avoid any IP based configurations
- Persistance: containers run as ephemeral environments - you have hence to use volumes for the SlurmCtld state, the database's data directory if you run it yourself and, depending on your setup, the logs. In K8s this translates to the fact that the deployments have to have PersitantVolumeClaims (the exact form depends on your infra provider)
- Logging: if you wish to stream the logs, you need to consider a sidecar setup unless you accept the log streaming client alongside the server component (at which point you're definitely banned from calling it a microservice ;) ). We used filebeats from the Elastic stack as sidecars to stream the logs to a ELK backend. A service mesh (like linkerd) may also be an option which provides this out of the box, but it's likely overkill
- Admin: you don't SSH into K8s pods, at least not as you do usually. You'll have to get familiar with the kubectl commands (which are not that crazy) if you want or need to look into stuff on the instance itself

Since horizontal scaling is not applicable, the benefits are limited to those given by a robust clustered setup (number of worker nodes >= 3 => automatic reallocation in case of hardware failure, etc) -  it allows hence to avoid to have to run a backup controller/dbd instances but it's not a panacea and running a K8s cluster doesn't come for free, so to speak. At the end we found it easier to stay on VMs and invest our efforts into good monitoring and incident response systems (which themselves are on the other hand excellent candidates for a K8s deployment).

Hope that helps and don't hesitate to drop me an email if you have further questions,
Tilman

----- Example of entrypoint script for a Slurmctld container -----
#! /bin/bash

##### Configurations
configurations_mount_path="/mnt/configurations"
secrets_mount_path="/mnt/secrets/"

##### Script
### Validations
[ ! -d "$configurations_mount_path" ] && echo "Configurations mount '$configurations_mount_path' not found. Aborting..." && exit 1
[ ! -d "$secrets_mount_path" ] && echo "Secrets mount '$secrets_mount_path' not found. Aborting..." && exit 1

### Munge
cp /mnt/secrets/munge.key /etc/munge/munge.key
chown munge: /etc/munge/munge.key
su -s /bin/bash -c munged munge

### SSSD
# run as root - configurations copied from mount
cp /mnt/configurations/sssd.conf /etc/sssd/sssd.conf
cp /mnt/configurations/ldap-certificate.pem /etc/sssd/ldap-certificate.pem
sssd

### Slurmctld
mkdir -p /etc/slurm /var/spool/slurm /var/run/slurm
# slurm.conf from mount, configure environment
cp /mnt/configurations/slurm.conf /etc/slurm/slurm.conf
export SLURM_CONF=/etc/slurm/slurm.conf
# optional ones
[ -f /mnt/configurations/cgroups.conf ] && cp /mnt/configurations/cgroups.conf /etc/slurm/cgroups.conf
[ -f /mnt/configurations/gres.conf ] && cp /mnt/configurations/gres.conf /etc/slurm/gres.conf
[ -f /mnt/configurations/plugstack.conf ] && cp /mnt/configurations/plugstack.conf /etc/slurm/plugstack.conf
# pass fs ownership to slurm user (can only be done at runtime because it's only defined once sssd is running - LDAP behind)
chown -R slurm: /etc/slurm /var/spool/slurm /var/run/slurm

slurmctld -D

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210702/e47c8547/attachment.htm>