[slurm-users] Cleanup of job_container/tmpfs
Michael Jennings
mej at lanl.gov
Mon Mar 6 21:06:12 UTC 2023
On Monday, 06 March 2023, at 10:15:22 (+0100),
Niels Carl W. Hansen wrote:
> Seems there still are some issues with the autofs -
> job_container/tmpfs functionality in Slurm 23.02.
> If the required directories aren't mounted on the allocated node(s)
> before jobstart, we get:
>
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
> directory: going to /tmp instead
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or
> directory: going to /tmp instead
>
> An easy workaround however, is to include this line in the slurm
> prolog on the slurmd -nodes:
>
> /usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true
>
> -but there might exist a better way to solve the problem?
What we do, and what any site using NHC in (or as) its Prolog script
can do, is use the `-F` option of `check_fs_mount()`. By changing
`-f` to `-F`, NHC will know to trigger the filesystem to be mounted.
Here's an example from one of our clusters (names changed, of course):
check_fs_mount_rw -t "nfs" -s "nas-srv:/proj" -F "/net/nfs/projects"
Documentation for the check is at https://github.com/mej/nhc#check_fs_mount
if you're interested.
I'm not sure that's "better," but it's an option. :-)
HTH!
Michael
--
Michael E. Jennings (he/him) <mej at lanl.gov> https://hpc.lanl.gov/
HPC Platform Integration Engineer - Platforms Design Team - HPC Design Group
Ultra-Scale Research Center (USRC), 4200 W Jemez #301-25 +1 (505) 606-0605
Los Alamos National Laboratory, P.O. Box 1663, Los Alamos, NM 87545-0001
More information about the slurm-users
mailing list