[slurm-users] Problem with sbatch
Daniel Torregrosa
daniel.torregrosa at insight-centre.org
Mon Jul 8 16:01:39 UTC 2019
Hi all,
I am currently testing slurm (slurm-wlm 17.11.2 from a newly installed and
updated Ubuntu server LTS). I managed to make it work on a very simple 1
master node and 2 compute nodes configuration. All three nodes have the
same users (namely root, slurm and test), with slurm running both slurmctld
and slurmd on the corresponding node (i.e. SlurmUser=slurm and
SlurmdUser=slurm), and test as the only loggable user.
Commands such as `salloc` and `srun` work perfectly, but `sbatch` fails. In
`squeue`, I get "(launch failed requeued help)". When I check the
corresponding compute node log, I get "error:
chown(/var/spool/slurmd/d/jobxxxxx): Operation not permitted". The previous
line has "Launching batch job xx for UID 1000" (test) or 0 (root) if
running `sudo sbatch`.
Batch file looks like
#! /bin/bash
#SBATCH -J myjob
hostname
I suspect that the problem is that `srun` and `salloc` are being run by
SlurmdUser (slurm, i.e. `srun whoami` returns slurm), who owns
/var/spool/slurmd, but sbatch tasks are being run by the user issuing the
command (test).
Should I chmod /var/spool/slurmd so any user can write there, or do I have
a configuration problem? I feel like I am missing something critical here.
Thanks a lot.
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190708/7252177d/attachment.htm>
More information about the slurm-users
mailing list