[slurm-users] slurmrestd service broken by 22.05.07 update

Timo Rothenpieler timo.rothenpieler at uni-bremen.de
Thu Dec 29 16:31:56 UTC 2022


Ideally, the systemd service would specify the User/Group already, and 
then also specify RuntimeDirectory=slurmrestd.
It then pre-creates a slurmrestd directory in /run for the service to 
put its runtime files (like sockets) into, avoiding any permission issues.

Having service files in top level dirs like /run or /var/lib is bound to 
cause issues like this.

On 29.12.2022 16:53, Chris Stackpole wrote:
> Thanks Brian!
> 
> I also discovered that I can edit the service file to remove the unix 
> socket. Doesn't seem to impact the things I'm working with anyway. But 
> this design choice still seems strange to me that editing the service 
> file is required. It seems like this should also be a configurable item 
> like the user information at the very least. But again, I've not found 
> any official documentation on how the devs expect us to configure this.
> 
> Thanks!
> 
> On 12/29/22 09:46, Brian Andrus wrote:
>> I dug up my old stuff for getting it started and see that I just 
>> disabled the unix socket completely. I was never able to get it to 
>> work for the reasons you are seeing, so I enabled it in listening 
>> mode. There are comments in the service file about it, but to do so, I 
>> changed the 'ExecStart' line in the systemd service file to be:
>>
>> /*ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS*/
>>
>> Then I created /etc/default/slurmrestd and added:
>>
>>     /*SLURM_JWT=daemon*//*
>>     *//*SLURMRESTD_LISTEN=0.0.0.0:8081*//*
>>     *//*SLURMRESTD_DEBUG=4*//*
>>     *//*SLURMRESTD_OPTIONS="-f /etc/slurm/slurm.conf"*/
>>
>> You can change those as needed. This made it listen on port 8081 only 
>> (no socket and not 6820)
>>
>> I was then able to just use curl on port 8081 to test things.
>>
>> Hope that helps.
>>
>> Brian Andrus
>>
>> On 12/29/2022 6:49 AM, Chris Stackpole wrote:
>>> Greetings,
>>>
>>> Thanks for responding!
>>>
>>> On 12/28/22 20:35, Brian Andrus wrote:
>>>> I suspect if you delete /var/lib/slurmrestd.socket and then start 
>>>> slurmrestd, it will create it as the user you need it to be.
>>>>
>>>> Or just change the owner of it to the slurmrestd owner.
>>>
>>>
>>> No go on that. Because /var/lib requires root to create 
>>> /var/lib/slurmrestd.socket . Which is what I meant by "has to write 
>>> into a root-only directory to create the unix socket".
>>> Here, I'll show what happens with me.
>>> Spun up a virtual machine with nothing changed on a fresh compile of 
>>> 22.05.07.
>>>
>>> # rm -rf /var/lib/slurmrestd.socket
>>> # systemctl start slurmrestd
>>> # systemctl status slurmrestd
>>> <snip>
>>> Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
>>> 54s ago
>>> <snip>
>>>
>>> # journalctl -xe
>>> <snip>
>>> Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
>>> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
>>> socket: Permission denied
>>> Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: 
>>> Main process exited, code=exited, status=1/FAILURE
>>>
>>> Now what about giving ownership to the user?
>>>
>>> # touch /var/lib/slurmrestd.socket
>>> # systemctl start slurmrestd
>>> # systemctl status slurmrestd
>>> <snip>
>>> Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
>>> 1min 2s ago
>>> <snip>
>>> # journalctl -xe
>>> <snip>
>>> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
>>> unlink(/var/lib/slurmrestd.socket): Permission denied
>>> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
>>> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
>>> socket: Address already in use
>>>
>>> Again, it doesn't have permissions to modify those files nor create 
>>> files inside that directory.
>>>
>>> On 12/28/22 20:35, Brian Andrus wrote:
>>> > I have been running slurmrestd as a separate user for some time.
>>>
>>> Under 22.05.07? Because that's what broke things for me. And I think 
>>> that it's this change:
>>>
>>> | -- slurmrestd - switch users earlier on startup to avoid sockets being
>>> | made as root.
>>>
>>> I'm not saying it's a bad change either - but I don't see any 
>>> documentation on the proper way to handle it and I don't feel like 
>>> editing the service file is the proper way to handle it.
>>>
>>> Thanks!
>>>
> 



More information about the slurm-users mailing list