[slurm-users] Cleanup of job_container/tmpfs
Brian Andrus
toomuchit at gmail.com
Mon Mar 6 20:06:56 UTC 2023
That looks like the users' home directory doesn't exist on the node.
If you are not using a shared home for the nodes, your onboarding
process should be looked at to ensure it can handle any issues that may
arise.
If you are using a shared home, you should do the above and have the
node ensure the shared filesystems are mounted before allowing jobs.
-Brian Andrus
On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote:
> Hi all
>
> Seems there still are some issues with the autofs -
> job_container/tmpfs functionality in Slurm 23.02.
> If the required directories aren't mounted on the allocated node(s)
> before jobstart, we get:
>
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
> directory: going to /tmp instead
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
> directory: going to /tmp instead
>
> An easy workaround however, is to include this line in the slurm
> prolog on the slurmd -nodes:
>
> /usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true
>
> -but there might exist a better way to solve the problem?
>
> Best
> Niels Carl
>
>
>
>
>
> On 3/2/23 12:27 AM, Jason Ellul wrote:
>>
>> Thanks so much Ole for the info and link,
>>
>> Your documentation is extremely useful.
>>
>> Prior to moving to 22.05 we had been using slurm-spank-private-tmpdir
>> with an epilog to clean-up the folders on job completion, but we were
>> hoping to move to the inbuilt functionality to ensure future
>> compatibility and reduce complexity.
>>
>> Will try 23.02 and if that does not resolve our issue consider moving
>> back to slurm-spank-private-tmpdir or auto_tmpdir.
>>
>> Thanks again,
>>
>> Jason
>>
>> Jason Ellul
>> Head - Research Computing Facility
>> Office of Cancer Research
>> Peter MacCallum Cancer Center
>>
>> *From: *slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf
>> of Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
>> *Date: *Wednesday, 1 March 2023 at 8:29 pm
>> *To: *slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
>> *Subject: *Re: [slurm-users] Cleanup of job_container/tmpfs
>>
>> ! EXTERNAL EMAIL: Think before you click. If suspicious send to
>> CyberReport at petermac.org
>>
>> Hi Jason,
>>
>> IMHO, the job_container/tmpfs is not working well in Slurm 22.05, but
>> there may be some significant improvements included in 23.02 (announced
>> yesterday). I've documented our experiences in the Wiki page
>> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories
>> <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#temporary-job-directories>
>> This page contains links to bug reports against the job_container/tmpfs
>> plugin.
>>
>> We're using the auto_tmpdir SPANK plugin with great success in Slurm
>> 22.05.
>>
>> Best regards,
>> Ole
>>
>>
>> On 01-03-2023 03:27, Jason Ellul wrote:
>> > We have recently moved to slurm 22.05.8 and have configured
>> > job_container/tmpfs to allow private tmp folders.
>> >
>> > job_container.conf contains:
>> >
>> > AutoBasePath=true
>> >
>> > BasePath=/slurm
>> >
>> > And in slurm.conf we have set
>> >
>> > JobContainerType=job_container/tmpfs
>> >
>> > I can see the folders being created and they are being used but when a
>> > job completes the root folder is not being cleaned up.
>> >
>> > Example of running job:
>> >
>> > [root at papr-res-compute204 ~]# ls -al /slurm/14292874
>> >
>> > total 32
>> >
>> > drwx------ 3 root root 34 Mar 1 13:16 .
>> >
>> > drwxr-xr-x 518 root root 16384 Mar 1 13:16 ..
>> >
>> > drwx------ 2 mzethoven root 6 Mar 1 13:16 .14292874
>> >
>> > -r--r--r-- 1 root root 0 Mar 1 13:16 .ns
>> >
>> > Example once job completes /slurm/<jobid> remains:
>> >
>> > [root at papr-res-compute204 ~]# ls -al /slurm/14292794
>> >
>> > total 32
>> >
>> > drwx------ 2 root root 6 Mar 1 09:33 .
>> >
>> > drwxr-xr-x 518 root root 16384 Mar 1 13:16 ..
>> >
>> > Is this to be expected or should the folder /slurm/<jobid> also be
>> removed?
>> >
>> > Do I need to create an epilog script to remove the directory that
>> is left?
>>
>>
>> *Disclaimer: *This email (including any attachments or links) may
>> contain confidential and/or legally privileged information and is
>> intended only to be read or used by the addressee. If you are not the
>> intended addressee, any use, distribution, disclosure or copying of
>> this email is strictly prohibited. Confidentiality and legal
>> privilege attached to this email (including any attachments) are not
>> waived or lost by reason of its mistaken delivery to you. If you have
>> received this email in error, please delete it and notify us
>> immediately by telephone or email. Peter MacCallum Cancer Centre
>> provides no guarantee that this transmission is free of virus or that
>> it has not been intercepted or altered and will not be liable for any
>> delay in its receipt.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230306/2589c690/attachment-0001.htm>
More information about the slurm-users
mailing list