[slurm-users] Cleanup of job_container/tmpfs

Niels Carl W. Hansen ncwh at cscaa.dk
Mon Mar 6 09:15:22 UTC 2023


Hi all

Seems there still are some issues with the autofs - job_container/tmpfs 
functionality in Slurm 23.02.
If the required directories aren't mounted on the allocated node(s) 
before jobstart, we get:

slurmstepd: error: couldn't chdir to `/users/lutest': No such file or 
directory: going to /tmp instead
slurmstepd: error: couldn't chdir to `/users/lutest': No such file or 
directory: going to /tmp instead

An easy workaround however, is to include this line in the slurm prolog 
on the slurmd -nodes:

/usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true

-but there might exist a better way to solve the problem?

Best
Niels Carl





On 3/2/23 12:27 AM, Jason Ellul wrote:
>
> Thanks so much Ole for the info and link,
>
> Your documentation is extremely useful.
>
> Prior to moving to 22.05 we had been using slurm-spank-private-tmpdir 
> with an epilog to clean-up the folders on job completion, but we were 
> hoping to move to the inbuilt functionality to ensure future 
> compatibility and reduce complexity.
>
> Will try 23.02 and if that does not resolve our issue consider moving 
> back to slurm-spank-private-tmpdir or auto_tmpdir.
>
> Thanks again,
>
> Jason
>
> Jason Ellul
> Head - Research Computing Facility
> Office of Cancer Research
> Peter MacCallum Cancer Center
>
> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf 
> of Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
> *Date: *Wednesday, 1 March 2023 at 8:29 pm
> *To: *slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
> *Subject: *Re: [slurm-users] Cleanup of job_container/tmpfs
>
> ! EXTERNAL EMAIL: Think before you click. If suspicious send to 
> CyberReport at petermac.org
>
> Hi Jason,
>
> IMHO, the job_container/tmpfs is not working well in Slurm 22.05, but
> there may be some significant improvements included in 23.02 (announced
> yesterday).  I've documented our experiences in the Wiki page
> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories 
> <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories>
> This page contains links to bug reports against the job_container/tmpfs
> plugin.
>
> We're using the auto_tmpdir SPANK plugin with great success in Slurm 
> 22.05.
>
> Best regards,
> Ole
>
>
> On 01-03-2023 03:27, Jason Ellul wrote:
> > We have recently moved to slurm 22.05.8 and have configured
> > job_container/tmpfs to allow private tmp folders.
> >
> > job_container.conf contains:
> >
> > AutoBasePath=true
> >
> > BasePath=/slurm
> >
> > And in slurm.conf we have set
> >
> > JobContainerType=job_container/tmpfs
> >
> > I can see the folders being created and they are being used but when a
> > job completes the root folder is not being cleaned up.
> >
> > Example of running job:
> >
> > [root at papr-res-compute204 ~]# ls -al /slurm/14292874
> >
> > total 32
> >
> > drwx------   3 root      root    34 Mar  1 13:16 .
> >
> > drwxr-xr-x 518 root      root 16384 Mar  1 13:16 ..
> >
> > drwx------   2 mzethoven root     6 Mar  1 13:16 .14292874
> >
> > -r--r--r--   1 root      root     0 Mar  1 13:16 .ns
> >
> > Example once job completes /slurm/<jobid> remains:
> >
> > [root at papr-res-compute204 ~]# ls -al /slurm/14292794
> >
> > total 32
> >
> > drwx------   2 root root     6 Mar  1 09:33 .
> >
> > drwxr-xr-x 518 root root 16384 Mar  1 13:16 ..
> >
> > Is this to be expected or should the folder /slurm/<jobid> also be 
> removed?
> >
> > Do I need to create an epilog script to remove the directory that is 
> left?
>
>
> *Disclaimer: *This email (including any attachments or links) may 
> contain confidential and/or legally privileged information and is 
> intended only to be read or used by the addressee. If you are not the 
> intended addressee, any use, distribution, disclosure or copying of 
> this email is strictly prohibited. Confidentiality and legal privilege 
> attached to this email (including any attachments) are not waived or 
> lost by reason of its mistaken delivery to you. If you have received 
> this email in error, please delete it and notify us immediately by 
> telephone or email. Peter MacCallum Cancer Centre provides no 
> guarantee that this transmission is free of virus or that it has not 
> been intercepted or altered and will not be liable for any delay in 
> its receipt.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230306/f59e2c8e/attachment.htm>


More information about the slurm-users mailing list