We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
* Apptainer * Podman (rootless) * Docker (rootless)
We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://github.com/megatron-uk/pam_subid), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.
If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.
I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.
Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.
What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?
John Snowdon Senior Research Infrastructure Engineer (HPC)
Research Software Engineering Catalyst Building, Room 2.01 Newcastle University 3 Science Square Newcastle Helix Newcastle upon Tyne NE4 5TG https://hpc.researchcomputing.ncl.ac.uk