Hi,
This is systemd, not slurm. We've also seen it being created and removed. As far as I understood something about the session that systemd clean up. We've worked around by adding this to the prolog:
MY_XDG_RUNTIME_DIR=/dev/shm/${USER}
mkdir -p $MY_XDG_RUNTIME_DIR
echo "export XDG_RUNTIME_DIR=$MY_XDG_RUNTIME_DIR"
(in combination with private tmpfs per job).
Ward
On 15/05/2024 10:14, Arnuld via slurm-users wrote:
> I am using the latest slurm. It runs fine for scripts. But if I give it a container then it kills it as soon as I submit the job. Is slurm cleaning up the $XDG_RUNTIME_DIR before it should? This is the log:
>
> [2024-05-15T08:00:35.143] [90.0] debug2: _generate_patterns: StepId=90.0 TaskId=-1
> [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command argv[0]=/bin/sh
> [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command argv[1]=-c
> [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command argv[2]=crun --rootless=true --root=/run/user/1000/ state slurm2.acog.90.0.-1
> [2024-05-15T08:00:35.167] [90.0] debug: _get_container_state: RunTimeQuery rc:256 output:error opening file `/run/user/1000/slurm2.acog.90.0.-1/status`: No such file or directory
>
> [2024-05-15T08:00:35.167] [90.0] error: _get_container_state: RunTimeQuery failed rc:256 output:error opening file `/run/user/1000/slurm2.acog.90.0.-1/status`: No such file or directory
>
> [2024-05-15T08:00:35.167] [90.0] debug: container already dead
> [2024-05-15T08:00:35.167] [90.0] debug3: _generate_spooldir: task:0 pattern:%m/oci-job%j-%s/task-%t/ path:/var/spool/slurmd/oci-job90-0/task-0/
> [2024-05-15T08:00:35.167] [90.0] debug2: _generate_patterns: StepId=90.0 TaskId=0
> [2024-05-15T08:00:35.168] [90.0] debug3: _generate_spooldir: task:-1 pattern:%m/oci-job%j-%s/ path:/var/spool/slurmd/oci-job90-0/
> [2024-05-15T08:00:35.168] [90.0] stepd_cleanup: done with step (rc[0x100]:Unknown error 256, cleanup_rc[0x0]:No error)
> [2024-05-15T08:00:35.275] debug3: in the service_connection
> [2024-05-15T08:00:35.278] debug2: Start processing RPC: REQUEST_TERMINATE_JOB
> [2024-05-15T08:00:35.278] debug2: Processing RPC: REQUEST_TERMINATE_JOB
> [2024-05-15T08:00:35.278] debug: _rpc_terminate_job: uid = 64030 JobId=90
> [2024-05-15T08:00:35.278] debug: credential for job 90 revoked
>
>
>
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com