[slurm-users] sbatch sending the working directory from the controller to the node
Dean Schulze
dean.w.schulze at gmail.com
Tue Jan 21 19:27:12 UTC 2020
I run this sbatch script from the controller:
=======================
#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --mail-type=NONE # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --ntasks=1
#SBATCH --mem=1gb
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
#SBATCH --output=test_job_%j.log # Standard output and error log
pwd; hostname; date
=======================
The node gets the directory that sbatch was executed from on the controller
and tries to write the output file to that directory, which doesn't exist
on the node. The node slurmd.log shows this error:
2020-01-21T11:25:36.389] [7.batch] error: Could not open stdout file
/home/dean/src/slurm.example.scripts/serial_test_7.log: No such file or
directory
If I change the sbatch script --output to a fully qualified directory that
exists on the node
--output=/home/nodeuser/serial_test_%j.log
the output file is written to that directory, but it includes this error
showing that the slurm node is trying to execute the job in the directory
that sbatch was run from on the controller:
slurmstepd: error: couldn't chdir to
`/home/dean/src/slurm.example.scripts': No such file or directory: going to
/tmp instead
The sbatch docs say nothing about why the node gets the pwd from the
controller. Why would slurm send a directory to a node that may not exist
on the node and expect it to use it?
What's the right way to specify the --output directory in an sbatch script?
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200121/07fda6d1/attachment.htm>
More information about the slurm-users
mailing list