[slurm-users] [ext] Re: Cleanup of job_container/tmpfs
Niels Carl W. Hansen
ncwh at cscaa.dk
Tue Mar 7 15:06:01 UTC 2023
That was exactly the bit I was missing. Thank you very much, Magnus!
Best
Niels Carl
On 3/7/23 3:13 PM, Hagdorn, Magnus Karl Moritz wrote:
> I just upgrade slurm to 23.02 on our test cluster to try out the new
> job_container/tmpfs stuff. I can confirm it works with autofs (hurrah!)
> but you need to set the Shared=true option in the job_container.conf
> file.
> Cheers
> magnus
>
> On Tue, 2023-03-07 at 09:19 +0100, Ole Holm Nielsen wrote:
>> Hi Brian,
>>
>> Presumably the users' home directory is NFS automounted using autofs,
>> and
>> therefore it doesn't exist when the job starts.
>>
>> The job_container/tmpfs plugin ought to work correctly with autofs,
>> but
>> maybe this is still broken in 23.02?
>>
>> /Ole
>>
>>
>> On 3/6/23 21:06, Brian Andrus wrote:
>>> That looks like the users' home directory doesn't exist on the
>>> node.
>>>
>>> If you are not using a shared home for the nodes, your onboarding
>>> process
>>> should be looked at to ensure it can handle any issues that may
>>> arise.
>>>
>>> If you are using a shared home, you should do the above and have
>>> the node
>>> ensure the shared filesystems are mounted before allowing jobs.
>>>
>>> -Brian Andrus
>>>
>>> On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote:
>>>> Hi all
>>>>
>>>> Seems there still are some issues with the autofs -
>>>> job_container/tmpfs
>>>> functionality in Slurm 23.02.
>>>> If the required directories aren't mounted on the allocated
>>>> node(s)
>>>> before jobstart, we get:
>>>>
>>>> slurmstepd: error: couldn't chdir to `/users/lutest': No such
>>>> file or
>>>> directory: going to /tmp instead
>>>> slurmstepd: error: couldn't chdir to `/users/lutest': No such
>>>> file or
>>>> directory: going to /tmp instead
>>>>
>>>> An easy workaround however, is to include this line in the slurm
>>>> prolog
>>>> on the slurmd -nodes:
>>>>
>>>> /usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true
>>>>
>>>> -but there might exist a better way to solve the problem?
More information about the slurm-users
mailing list