[slurm-users] slurmrestd service broken by 22.05.07 update

Brian Andrus toomuchit at gmail.com
Thu Dec 29 15:46:55 UTC 2022


I dug up my old stuff for getting it started and see that I just 
disabled the unix socket completely. I was never able to get it to work 
for the reasons you are seeing, so I enabled it in listening mode. There 
are comments in the service file about it, but to do so, I changed the 
'ExecStart' line in the systemd service file to be:

/*ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS*/

Then I created /etc/default/slurmrestd and added:

    /*SLURM_JWT=daemon*//*
    *//*SLURMRESTD_LISTEN=0.0.0.0:8081*//*
    *//*SLURMRESTD_DEBUG=4*//*
    *//*SLURMRESTD_OPTIONS="-f /etc/slurm/slurm.conf"*/

You can change those as needed. This made it listen on port 8081 only 
(no socket and not 6820)

I was then able to just use curl on port 8081 to test things.

Hope that helps.

Brian Andrus

On 12/29/2022 6:49 AM, Chris Stackpole wrote:
> Greetings,
>
> Thanks for responding!
>
> On 12/28/22 20:35, Brian Andrus wrote:
>> I suspect if you delete /var/lib/slurmrestd.socket and then start 
>> slurmrestd, it will create it as the user you need it to be.
>>
>> Or just change the owner of it to the slurmrestd owner.
>
>
> No go on that. Because /var/lib requires root to create 
> /var/lib/slurmrestd.socket . Which is what I meant by "has to write 
> into a root-only directory to create the unix socket".
> Here, I'll show what happens with me.
> Spun up a virtual machine with nothing changed on a fresh compile of 
> 22.05.07.
>
> # rm -rf /var/lib/slurmrestd.socket
> # systemctl start slurmrestd
> # systemctl status slurmrestd
> <snip>
> Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
> 54s ago
> <snip>
>
> # journalctl -xe
> <snip>
> Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
> socket: Permission denied
> Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: 
> Main process exited, code=exited, status=1/FAILURE
>
> Now what about giving ownership to the user?
>
> # touch /var/lib/slurmrestd.socket
> # systemctl start slurmrestd
> # systemctl status slurmrestd
> <snip>
> Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
> 1min 2s ago
> <snip>
> # journalctl -xe
> <snip>
> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
> unlink(/var/lib/slurmrestd.socket): Permission denied
> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
> socket: Address already in use
>
> Again, it doesn't have permissions to modify those files nor create 
> files inside that directory.
>
> On 12/28/22 20:35, Brian Andrus wrote:
> > I have been running slurmrestd as a separate user for some time.
>
> Under 22.05.07? Because that's what broke things for me. And I think 
> that it's this change:
>
> | -- slurmrestd - switch users earlier on startup to avoid sockets being
> | made as root.
>
> I'm not saying it's a bad change either - but I don't see any 
> documentation on the proper way to handle it and I don't feel like 
> editing the service file is the proper way to handle it.
>
> Thanks!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221229/eb6f0686/attachment.htm>


More information about the slurm-users mailing list