[slurm-users] Cleanup of job_container/tmpfs

Mon Mar 6 21:06:12 UTC 2023

On Monday, 06 March 2023, at 10:15:22 (+0100),
Niels Carl W. Hansen wrote:

> Seems there still are some issues with the autofs - 
> job_container/tmpfs functionality in Slurm 23.02.
> If the required directories aren't mounted on the allocated node(s) 
> before jobstart, we get:
> 
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or 
> directory: going to /tmp instead
> slurmstepd: error: couldn't chdir to `/users/lutest': No such file or 
> directory: going to /tmp instead
> 
> An easy workaround however, is to include this line in the slurm 
> prolog on the slurmd -nodes:
> 
> /usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true
> 
> -but there might exist a better way to solve the problem?

What we do, and what any site using NHC in (or as) its Prolog script
can do, is use the `-F` option of `check_fs_mount()`.  By changing
`-f` to `-F`, NHC will know to trigger the filesystem to be mounted.
Here's an example from one of our clusters (names changed, of course):

   check_fs_mount_rw -t "nfs" -s "nas-srv:/proj" -F "/net/nfs/projects"

Documentation for the check is at https://github.com/mej/nhc#check_fs_mount
if you're interested.

I'm not sure that's "better," but it's an option. :-)

HTH!
Michael

-- 
Michael E. Jennings (he/him) <mej at lanl.gov>            https://hpc.lanl.gov/
HPC Platform Integration Engineer - Platforms Design Team - HPC Design Group
Ultra-Scale Research Center (USRC), 4200 W Jemez #301-25   +1 (505) 606-0605
Los Alamos National Laboratory,  P.O. Box 1663,  Los Alamos, NM   87545-0001