[slurm-users] Problem using Podman with scrun on SLURM 23.11.3
Marcus Lauer
melauer at seas.upenn.edu
Thu Jan 25 16:28:13 UTC 2024
I am getting an unusual error when trying to run Podman containers
using scrun on SLURM 23.11.3 (and 23.11.1 previously). In short, Podman
works when not configured to use scrun, but when configured to use scrun it
fails.
Podman gives this error:
scrun: fatal: Unable to request job allocation: Job cannot be submitted
without the current working directory specified.
The slurmctld logs show this error:
_slurm_rpc_allocate_resources: Job cannot be submitted without the current
working directory specified.
The scrun command does not appear to have a --chdir option so the working
directory must be detected automatically, and this appears to be failing.
Perhaps someone has encountered this or a similar error before? If
anyone has seen this before I would love to hear about your experiences.
A bit more detail in case anyone is interested. This is a two-node
test system with one head node and one compute node, both running Rocky
Linux 8.9. I have created /etc/containers/storage.conf and containers.conf
files which are identical to the ones in the SLURM Containers Guide except
that I set the rootless_storage_path to the same value as the graphroot in
storage.conf. When I get rid of the /etc/containers/containers.conf file I
can run containers on the head node. The backing store is ext4, though I
have tried NFS (which had other problems). I have tried the vfs and overlay
storage drivers but get the same results.
On the SLURM side I created an oci.conf which is the same as the
"oci.conf example for crun using run (suggested)" from the SLURM Container
Guide. I'm using crun because I have forced cgroup v2 on both systems (
systemd.unified_cgroup_hierarchy=1 on the kernel cmdline) and crun seems to
support cgroup v2 better. I am using an scrun.lua which is similar to the
one in the scrun manual page with some paths modified for my setup.
I should also mention that sbatch jobs run just fine and srun works
as well. the test cluster seems to be working fine in general.
--
Marcus Lauer
Systems Administrator
CETS Group, Research Support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240125/912d5c1f/attachment.htm>
More information about the slurm-users
mailing list