We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
* Apptainer * Podman (rootless) * Docker (rootless)
We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://github.com/megatron-uk/pam_subid), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.
If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.
I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.
Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.
What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?
John Snowdon Senior Research Infrastructure Engineer (HPC)
Research Software Engineering Catalyst Building, Room 2.01 Newcastle University 3 Science Square Newcastle Helix Newcastle upon Tyne NE4 5TG https://hpc.researchcomputing.ncl.ac.uk
for what it's worth, we found the simplest solution was just to run a prolog/epilog to create the directories and clean them up. it's only a couple lines of bash.
On Fri, Sep 5, 2025 at 7:59 AM John Snowdon via slurm-users slurm-users@lists.schedmd.com wrote:
We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
Apptainer Podman (rootless) Docker (rootless)
We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://github.com/megatron-uk/pam_subid), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.
If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.
I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.
Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.
What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?
John Snowdon Senior Research Infrastructure Engineer (HPC)
Research Software Engineering Catalyst Building, Room 2.01 Newcastle University 3 Science Square Newcastle Helix Newcastle upon Tyne NE4 5TG https://hpc.researchcomputing.ncl.ac.uk
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
We recently setup the same thing (Rocky 8). What we did was we set /etc/containers/storage.conf and pointed the following variables to /tmp:
storage.conf:runroot = "/tmp/containers-user-$UID/storage" storage.conf:graphroot = "/tmp/containers-user-$UID/storage" storage.conf:rootless_storage_path = "/tmp/containers-user-$UID/storage"
We also have a prune script which cleans up /tmp periodically keeping it clean.
I like your solution for subuid, we put together a puppet module that does much the same thing: https://github.com/fasrc/puppet-subuid
-Paul Edmon-
On 9/5/25 9:20 AM, Michael DiDomenico via slurm-users wrote:
for what it's worth, we found the simplest solution was just to run a prolog/epilog to create the directories and clean them up. it's only a couple lines of bash.
On Fri, Sep 5, 2025 at 7:59 AM John Snowdon via slurm-users slurm-users@lists.schedmd.com wrote:
We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
Apptainer Podman (rootless) Docker (rootless)
We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://github.com/megatron-uk/pam_subid), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.
If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.
I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.
Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.
What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?
John Snowdon Senior Research Infrastructure Engineer (HPC)
Research Software Engineering Catalyst Building, Room 2.01 Newcastle University 3 Science Square Newcastle Helix Newcastle upon Tyne NE4 5TG https://hpc.researchcomputing.ncl.ac.uk
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hi Michael,
We're on RHEL 9 - it's a newly commissioned system without any (real) users yet, so we have the relative freedom to make fairly substantial changes without impacting any production work for the moment.
I've tried various combinations of storage.conf settings (we already set runroot to a similar /tmp location and graphroot is to an NFS-mounted user home so that the user image library persists across all nodes).... but I always found Podman throw an error relating to creating an 'events' folder under /run/user/$UID .. And I just couldn't figure out which setting this was (setting the podman event backend type to 'none' stopped the mkdir error, but also seemed to prevent Podman from running).
It sounds like the prolog/epilog solution is going to be the easiest route to resolve this.
I hadn't thought about a clean up script for those /tmp entries... but yeah, that's clearly going to need to be put in place as well.
Thanks for the ideas!
John ________________________________ From: Paul Edmon via slurm-users slurm-users@lists.schedmd.com Sent: 05 September 2025 14:29 To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Creating /run/user/$UID - for Podman runtime
⚠ External sender. Take care when opening links or attachments. Do not provide your login details.
We recently setup the same thing (Rocky 8). What we did was we set /etc/containers/storage.conf and pointed the following variables to /tmp:
storage.conf:runroot = "/tmp/containers-user-$UID/storage" storage.conf:graphroot = "/tmp/containers-user-$UID/storage" storage.conf:rootless_storage_path = "/tmp/containers-user-$UID/storage"
We also have a prune script which cleans up /tmp periodically keeping it clean.
I like your solution for subuid, we put together a puppet module that does much the same thing: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...https://github.com/fasrc/puppet-subuid
-Paul Edmon-
On 9/5/25 9:20 AM, Michael DiDomenico via slurm-users wrote:
for what it's worth, we found the simplest solution was just to run a prolog/epilog to create the directories and clean them up. it's only a couple lines of bash.
On Fri, Sep 5, 2025 at 7:59 AM John Snowdon via slurm-users slurm-users@lists.schedmd.com wrote:
We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
Apptainer Podman (rootless) Docker (rootless)
We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...)https://github.com/megatron-uk/pam_subid, which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.
If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.
I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.
Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.
What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?
John Snowdon Senior Research Infrastructure Engineer (HPC)
Research Software Engineering Catalyst Building, Room 2.01 Newcastle University 3 Science Square Newcastle Helix Newcastle upon Tyne NE4 5TG https://hpc.researchcomputing.ncl.ac.ukhttps://hpc.researchcomputing.ncl.ac.uk/
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
For reference we used this puppet module for managing podman: https://forge.puppet.com/modules/southalc/podman/readme
-Paul Edmon-
On 9/5/25 9:20 AM, Michael DiDomenico via slurm-users wrote:
for what it's worth, we found the simplest solution was just to run a prolog/epilog to create the directories and clean them up. it's only a couple lines of bash.
On Fri, Sep 5, 2025 at 7:59 AM John Snowdon via slurm-users slurm-users@lists.schedmd.com wrote:
We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
Apptainer Podman (rootless) Docker (rootless)
We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://github.com/megatron-uk/pam_subid), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.
If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.
I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.
Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.
What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?
John Snowdon Senior Research Infrastructure Engineer (HPC)
Research Software Engineering Catalyst Building, Room 2.01 Newcastle University 3 Science Square Newcastle Helix Newcastle upon Tyne NE4 5TG https://hpc.researchcomputing.ncl.ac.uk
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
John, we ran into the same issues that you did. One thing we discovered was that podman relies heavily on the $TMPDIR variable if it is set. It seemed that in spite of making changes to storage.conf, podman still tried to use $TMPDIR for some of its state information. Since $TMPDIR on our cluster was pointed at an NFS mount, that created all sorts of issues.
We implemented similar solutions as discussed in this thread. However, jobs that are going to run podman had to be configured to unset $TMPDIR. This allowed the rest of our podman config to work as intended. So the existence of $TMPDIR kept interfering with our solution. Easy to fix since our compute jobs are created using an automated build process.
Roger Moye HPC Architect 713.898.0021 Mobile
QUANTLAB Financial, LLC 3 Greenway Plaza Suite 200 Houston, Texas 77046 www.quantlab.comhttps://www.quantlab.com/
From: John Snowdon via slurm-users slurm-users@lists.schedmd.com Sent: Friday, September 5, 2025 2:55 AM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Creating /run/user/$UID - for Podman runtime [External Email]
Caution: This email originated from outside of the organization. Do not click links or open attachments unless you recognize and know the content is safe.
We are in the middle of implementing an extensive range of container support on our new HPC platform and have decided to offer our users a wide suite of technologies to better support their workloads:
* Apptainer
* Podman (rootless)
* Docker (rootless)
We've already got a solution for automated entries in /etc/subuid and /etc/subgid on the head nodes (available here under GPL: https://github.com/megatron-uk/pam_subid), which is where we intend users to build their container images, and building and running containers using Apptainer and Podman in those environments works really well - we're happy that it should take care of 95% of our users needs (Docker is the last few percent....) and not involve giving them any special permissions.
If I ssh directly to a compute node, then Podman also works there to run an existing image (podman container run ...).
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
It appears as though Podman (in rootless mode) wants to put the majority of its run time / state information under /run/user/$UID ... this is fine on the head nodes which have interactive logins hitting PAM modules which instantiate the /run/user/$UID directories, but not under sbatch/srun which doesn't create them by default.
I've not been able to find a single, magical setting which will move all of the Podman state information out from /run/user to another location - there are 3 or 4 settings involved, and even then I still find various bits of Podman want to create stuff under there.
Rather than hacking away at getting Podman changed to move all settings and state information elsewhere, it seems like the cleanest solution would just be to put the regular /run/user/$UID directory in place at the point Slurm starts the job instead.
What's the best way to get Slurm to create this and clean-up afterwards? Should this be in a prolog/epilog wrapper (e.g. directly calling loginctl) or is it cleaner to get Slurm to trigger the usual PAM session machinery in some manner?
John Snowdon Senior Research Infrastructure Engineer (HPC)
Research Software Engineering Catalyst Building, Room 2.01 Newcastle University 3 Science Square Newcastle Helix Newcastle upon Tyne NE4 5TG https://hpc.researchcomputing.ncl.ac.uk -----------------------------------------------------------------------------------
The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440
On 9/5/25 12:55 am, John Snowdon via slurm-users wrote:
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
We found that we had to make /run/user/$UID private per job via a script run from the job_container/tmpfs plugin in order to stop jobs from the same user on a node using podman (via podman-hpc) trashing each other.
The details (including the script and config) are in our public support ticket where I was flailing around looking for how to do this with CloneNSScript here: https://support.schedmd.com/show_bug.cgi?id=23228
All the best, Chris
Thanks to everyone for your feedback.
We've now implemented two simple prolog/epilog scripts which call the systemd 'loginctl' tool and this is creating / cleaning up the /run/user/$UID directory tree nicely.
Our Podman setup also places runroot in an individual user directory on local scratch directories, and graphroot is in a NFS-shared user home directory, so accessible across all of our compute nodes.
This now seems to work really nicely.
We've got subuid/subgid entries auto-generated on our login nodes to allow users to create/manage images there, but we made the design decision to not allow this on compute nodes, so we're currently running without that support.
I suspect for 99.98% of use cases this won't be an issue (our policy is not to support network services run by this method, so for most users this should be more than satisfactory), the fact is; our users don't have container support on the old platform that this new system is replacing, so it's a net-gain in functionality for them.
John ________________________________ From: Christopher Samuel via slurm-users slurm-users@lists.schedmd.com Sent: 06 September 2025 17:14 To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Creating /run/user/$UID - for Podman runtime
⚠ External sender. Take care when opening links or attachments. Do not provide your login details.
On 9/5/25 12:55 am, John Snowdon via slurm-users wrote:
What I'm struggling with now is running Podman under Slurm itself on our compute nodes.
We found that we had to make /run/user/$UID private per job via a script run from the job_container/tmpfs plugin in order to stop jobs from the same user on a node using podman (via podman-hpc) trashing each other.
The details (including the script and config) are in our public support ticket where I was flailing around looking for how to do this with CloneNSScript here: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.sc...https://support.schedmd.com/show_bug.cgi?id=23228
All the best, Chris -- Chris Samuel : https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel...http://www.csamuel.org/ : Berkeley, CA, USA
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com