Hi,
I'm running on Ubuntu 20.04. I've got a clean configuration of slurmctld
and slurmd on one node.
1) I've configured oci.conf to the defaults defined by "OCI.CONF EXAMPLE
FOR RUNC USING RUN (RECOMMENDED OVER USING CREATE/START):".
I have a container that I can run by hand:
runc --rootless true run -b
/opt/pilot_results/results.20241108-184831/step1 test
sh-4.2# exit
exit
and it returns.
However, when I
srun --container=/opt/pilot_results/results.20241108-184831/step1 ls
it hangs after completing the ls, and I have to double ctrl-c out of it.
2) I tried using the configuration for RUNC with Create/Start and it hangs
on start.
3) I tried using the configuration for CRUN using RUN, I can run the
container by hand with crun, but srun fails with:
srun --container=/opt/pilot_results/results.20241108-194914/step1 bash
bind socket to `/run/user/1008//pd-builds-bench-1.jrp.34.0.0/notify`:
Address already in use
sync socket closed
srun: error: pd-builds-bench-1: task 0: Exited with exit code 1
4) I tried using the configuration for CRUN with Create/Start and it errors
repeatedly with:
slurmstepd: error: _get_container_state: RunTimeQuery failed rc:256
output:error opening file
`/run/user/1008//pd-builds-bench-1.jrp.51.0.0/status`: No such file or
directory
I went through the (open and closed) support tickets and couldn't find
anything that reflects any of these errors, and I'm pretty stuck at this
point.
Any help would be welcome.
Thanks,
JRP