[slurm-users] Cleanup of job_container/tmpfs

Brian Andrus toomuchit at gmail.com
Mon Mar 6 20:06:56 UTC 2023


That looks like the users' home directory doesn't exist on the node.

If you are not using a shared home for the nodes, your onboarding 
process should be looked at to ensure it can handle any issues that may 
arise.

If you are using a shared home, you should do the above and have the 
node ensure the shared filesystems are mounted before allowing jobs.

-Brian Andrus

On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote:
> Hi all
>
> Seems there still are some issues with the autofs - 
> job_container/tmpfs functionality in Slurm 23.02.
> If the required directories aren't mounted on the allocated node(s) 
> before jobstart, we get:
>
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or 
> directory: going to /tmp instead
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or 
> directory: going to /tmp instead
>
> An easy workaround however, is to include this line in the slurm 
> prolog on the slurmd -nodes:
>
> /usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true
>
> -but there might exist a better way to solve the problem?
>
> Best
> Niels Carl
>
>
>
>
>
> On 3/2/23 12:27 AM, Jason Ellul wrote:
>>
>> Thanks so much Ole for the info and link,
>>
>> Your documentation is extremely useful.
>>
>> Prior to moving to 22.05 we had been using slurm-spank-private-tmpdir 
>> with an epilog to clean-up the folders on job completion, but we were 
>> hoping to move to the inbuilt functionality to ensure future 
>> compatibility and reduce complexity.
>>
>> Will try 23.02 and if that does not resolve our issue consider moving 
>> back to slurm-spank-private-tmpdir or auto_tmpdir.
>>
>> Thanks again,
>>
>> Jason
>>
>> Jason Ellul
>> Head - Research Computing Facility
>> Office of Cancer Research
>> Peter MacCallum Cancer Center
>>
>> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf 
>> of Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
>> *Date: *Wednesday, 1 March 2023 at 8:29 pm
>> *To: *slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
>> *Subject: *Re: [slurm-users] Cleanup of job_container/tmpfs
>>
>> ! EXTERNAL EMAIL: Think before you click. If suspicious send to 
>> CyberReport at petermac.org
>>
>> Hi Jason,
>>
>> IMHO, the job_container/tmpfs is not working well in Slurm 22.05, but
>> there may be some significant improvements included in 23.02 (announced
>> yesterday).  I've documented our experiences in the Wiki page
>> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories 
>> <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories>
>> This page contains links to bug reports against the job_container/tmpfs
>> plugin.
>>
>> We're using the auto_tmpdir SPANK plugin with great success in Slurm 
>> 22.05.
>>
>> Best regards,
>> Ole
>>
>>
>> On 01-03-2023 03:27, Jason Ellul wrote:
>> > We have recently moved to slurm 22.05.8 and have configured
>> > job_container/tmpfs to allow private tmp folders.
>> >
>> > job_container.conf contains:
>> >
>> > AutoBasePath=true
>> >
>> > BasePath=/slurm
>> >
>> > And in slurm.conf we have set
>> >
>> > JobContainerType=job_container/tmpfs
>> >
>> > I can see the folders being created and they are being used but when a
>> > job completes the root folder is not being cleaned up.
>> >
>> > Example of running job:
>> >
>> > [root at papr-res-compute204 ~]# ls -al /slurm/14292874
>> >
>> > total 32
>> >
>> > drwx------   3 root      root    34 Mar  1 13:16 .
>> >
>> > drwxr-xr-x 518 root      root 16384 Mar  1 13:16 ..
>> >
>> > drwx------   2 mzethoven root     6 Mar  1 13:16 .14292874
>> >
>> > -r--r--r--   1 root      root     0 Mar  1 13:16 .ns
>> >
>> > Example once job completes /slurm/<jobid> remains:
>> >
>> > [root at papr-res-compute204 ~]# ls -al /slurm/14292794
>> >
>> > total 32
>> >
>> > drwx------   2 root root     6 Mar  1 09:33 .
>> >
>> > drwxr-xr-x 518 root root 16384 Mar  1 13:16 ..
>> >
>> > Is this to be expected or should the folder /slurm/<jobid> also be 
>> removed?
>> >
>> > Do I need to create an epilog script to remove the directory that 
>> is left?
>>
>>
>> *Disclaimer: *This email (including any attachments or links) may 
>> contain confidential and/or legally privileged information and is 
>> intended only to be read or used by the addressee. If you are not the 
>> intended addressee, any use, distribution, disclosure or copying of 
>> this email is strictly prohibited. Confidentiality and legal 
>> privilege attached to this email (including any attachments) are not 
>> waived or lost by reason of its mistaken delivery to you. If you have 
>> received this email in error, please delete it and notify us 
>> immediately by telephone or email. Peter MacCallum Cancer Centre 
>> provides no guarantee that this transmission is free of virus or that 
>> it has not been intercepted or altered and will not be liable for any 
>> delay in its receipt.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230306/2589c690/attachment-0001.htm>


More information about the slurm-users mailing list