[slurm-users] slurmrestd service broken by 22.05.07 update
Brian Andrus
toomuchit at gmail.com
Thu Dec 29 15:46:55 UTC 2022
I dug up my old stuff for getting it started and see that I just
disabled the unix socket completely. I was never able to get it to work
for the reasons you are seeing, so I enabled it in listening mode. There
are comments in the service file about it, but to do so, I changed the
'ExecStart' line in the systemd service file to be:
/*ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS*/
Then I created /etc/default/slurmrestd and added:
/*SLURM_JWT=daemon*//*
*//*SLURMRESTD_LISTEN=0.0.0.0:8081*//*
*//*SLURMRESTD_DEBUG=4*//*
*//*SLURMRESTD_OPTIONS="-f /etc/slurm/slurm.conf"*/
You can change those as needed. This made it listen on port 8081 only
(no socket and not 6820)
I was then able to just use curl on port 8081 to test things.
Hope that helps.
Brian Andrus
On 12/29/2022 6:49 AM, Chris Stackpole wrote:
> Greetings,
>
> Thanks for responding!
>
> On 12/28/22 20:35, Brian Andrus wrote:
>> I suspect if you delete /var/lib/slurmrestd.socket and then start
>> slurmrestd, it will create it as the user you need it to be.
>>
>> Or just change the owner of it to the slurmrestd owner.
>
>
> No go on that. Because /var/lib requires root to create
> /var/lib/slurmrestd.socket . Which is what I meant by "has to write
> into a root-only directory to create the unix socket".
> Here, I'll show what happens with me.
> Spun up a virtual machine with nothing changed on a fresh compile of
> 22.05.07.
>
> # rm -rf /var/lib/slurmrestd.socket
> # systemctl start slurmrestd
> # systemctl status slurmrestd
> <snip>
> Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST;
> 54s ago
> <snip>
>
> # journalctl -xe
> <snip>
> Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal:
> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX
> socket: Permission denied
> Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service:
> Main process exited, code=exited, status=1/FAILURE
>
> Now what about giving ownership to the user?
>
> # touch /var/lib/slurmrestd.socket
> # systemctl start slurmrestd
> # systemctl status slurmrestd
> <snip>
> Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST;
> 1min 2s ago
> <snip>
> # journalctl -xe
> <snip>
> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error
> unlink(/var/lib/slurmrestd.socket): Permission denied
> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal:
> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX
> socket: Address already in use
>
> Again, it doesn't have permissions to modify those files nor create
> files inside that directory.
>
> On 12/28/22 20:35, Brian Andrus wrote:
> > I have been running slurmrestd as a separate user for some time.
>
> Under 22.05.07? Because that's what broke things for me. And I think
> that it's this change:
>
> | -- slurmrestd - switch users earlier on startup to avoid sockets being
> | made as root.
>
> I'm not saying it's a bad change either - but I don't see any
> documentation on the proper way to handle it and I don't feel like
> editing the service file is the proper way to handle it.
>
> Thanks!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221229/eb6f0686/attachment.htm>
More information about the slurm-users
mailing list