[slurm-users] Problem with sbatch

Michael Gutteridge michael.gutteridge at gmail.com
Mon Jul 8 17:49:51 UTC 2019


Hi

I can't find the reference here, but if I recall correctly the preferred
user for slurmd is actually root.  It is the default.

> I assume this can be fixed by modifying the configuration so
"SlurmdUser=root", but does this imply that anything run with `srun` will
be actually executed by root? This seems dangerous.

As far as safety, I think you're OK.  The sbatch/srun/salloc processes set
the user ID appropriately for the phase of the job being run. Source: been
running it this way for a while...

My apologies if running as a non-root user is a requirement for your
environment.

Michael


On Mon, Jul 8, 2019 at 9:39 AM Daniel Torregrosa <
daniel.torregrosa at insight-centre.org> wrote:

> You are right. The critical part I was missing is that chown does not work
> without sudo.
>
> I assume this can be fixed by modifying the configuration so
> "SlurmdUser=root", but does this imply that anything run with `srun` will
> be actually executed by root? This seems dangerous.
>
> Thanks a lot.
>
> On Mon, 8 Jul 2019 at 17:28, Jeffrey Frey <frey at udel.edu> wrote:
>
>> Does user "slurm" have the capability of reowning files/directories to an
>> arbitrary uid/gid?  Probably not -- that's something "root" can do, though.
>>
>>
>>
>>
>> > On Jul 8, 2019, at 12:01 PM, Daniel Torregrosa <
>> daniel.torregrosa at insight-centre.org> wrote:
>> >
>> > Hi all,
>> >
>> > I am currently testing slurm (slurm-wlm 17.11.2 from a newly installed
>> and updated Ubuntu server LTS). I managed to make it work on a very simple
>> 1 master node and 2 compute nodes configuration. All three nodes have the
>> same users (namely root, slurm and test), with slurm running both slurmctld
>> and slurmd on the corresponding node (i.e. SlurmUser=slurm and
>> SlurmdUser=slurm), and test as the only loggable user.
>> >
>> > Commands such as `salloc` and `srun` work perfectly, but `sbatch`
>> fails. In `squeue`, I get "(launch failed requeued help)". When I check the
>> corresponding compute node log, I get "error:
>> chown(/var/spool/slurmd/d/jobxxxxx): Operation not permitted". The previous
>> line has "Launching batch job xx for UID 1000" (test) or 0 (root) if
>> running `sudo sbatch`.
>> >
>> > Batch file looks like
>> >
>> > #! /bin/bash
>> > #SBATCH -J myjob
>> >
>> > hostname
>> >
>> > I suspect that the problem is that `srun` and `salloc` are being run by
>> SlurmdUser (slurm, i.e. `srun whoami` returns slurm), who owns
>> /var/spool/slurmd, but sbatch tasks are being run by the user issuing the
>> command (test).
>> >
>> > Should I chmod /var/spool/slurmd so any user can write there, or do I
>> have a configuration problem? I feel like I am missing something critical
>> here.
>> >
>> > Thanks a lot.
>> > Daniel
>>
>>
>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
>> Jeffrey T. Frey, Ph.D.
>> Systems Programmer V / HPC Management
>> Network & Systems Services / College of Engineering
>> University of Delaware, Newark DE  19716
>> Office: (302) 831-6034  Mobile: (302) 419-4976
>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190708/3a8510d4/attachment-0001.htm>


More information about the slurm-users mailing list