[slurm-users] sbatch sending the working directory from the controller to the node

Dean Schulze dean.w.schulze at gmail.com
Tue Jan 21 20:25:43 UTC 2020


So there is a --chdir for sbatch too.  This implies that the same path has
to exist on all nodes.  Something to keep in mind when creating a slurm
cluster.

On Tue, Jan 21, 2020 at 12:58 PM William Brown <william at signalbox.org.uk>
wrote:

> The srun man page says:
>
>
>
> When initiating remote processes *srun* will propagate the current
> working directory, unless *--chdir*=<*path*> is specified, in which case
> *path* will become the working directory for the remote processes.
>
>
>
> William
>
>
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf Of
> *Dean Schulze
> *Sent:* 21 January 2020 19:27
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* [slurm-users] sbatch sending the working directory from the
> controller to the node
>
>
>
> I run this sbatch script from the controller:
>
> =======================
> #!/bin/bash
> #SBATCH --job-name=test_job
> #SBATCH --mail-type=NONE    # Mail events (NONE, BEGIN, END, FAIL, ALL)
> #SBATCH --ntasks=1
> #SBATCH --mem=1gb
> #SBATCH --time=00:05:00     # Time limit hrs:min:sec
> #SBATCH --output=test_job_%j.log   # Standard output and error log
>
> pwd; hostname; date
> =======================
>
>
> The node gets the directory that sbatch was executed from on the
> controller and tries to write the output file to that directory, which
> doesn't exist on the node.  The node slurmd.log shows this error:
>
> 2020-01-21T11:25:36.389] [7.batch] error: Could not open stdout file
> /home/dean/src/slurm.example.scripts/serial_test_7.log: No such file or
> directory
>
>
> If I change the sbatch script --output to a fully qualified directory that
> exists on the node
>
>     --output=/home/nodeuser/serial_test_%j.log
>
> the output file is written to that directory, but it includes this error
> showing that the slurm node is trying to execute the job in the directory
> that sbatch was run from on the controller:
>
> slurmstepd: error: couldn't chdir to
> `/home/dean/src/slurm.example.scripts': No such file or directory: going to
> /tmp instead
>
>
> The sbatch docs say nothing about why the node gets the pwd from the
> controller.  Why would slurm send a directory to a node that may not exist
> on the node and expect it to use it?
>
>
>
> What's the right way to specify the --output directory in an sbatch script?
>
>
>
> Thanks.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200121/888e7753/attachment.htm>


More information about the slurm-users mailing list