[slurm-users] slurmrestd service broken by 22.05.07 update

Chris Stackpole cstackpole at advancedclustering.com
Thu Dec 29 15:53:17 UTC 2022


Thanks Brian!

I also discovered that I can edit the service file to remove the unix 
socket. Doesn't seem to impact the things I'm working with anyway. But 
this design choice still seems strange to me that editing the service 
file is required. It seems like this should also be a configurable item 
like the user information at the very least. But again, I've not found 
any official documentation on how the devs expect us to configure this.

Thanks!

On 12/29/22 09:46, Brian Andrus wrote:
> I dug up my old stuff for getting it started and see that I just 
> disabled the unix socket completely. I was never able to get it to work 
> for the reasons you are seeing, so I enabled it in listening mode. There 
> are comments in the service file about it, but to do so, I changed the 
> 'ExecStart' line in the systemd service file to be:
> 
> /*ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS*/
> 
> Then I created /etc/default/slurmrestd and added:
> 
>     /*SLURM_JWT=daemon*//*
>     *//*SLURMRESTD_LISTEN=0.0.0.0:8081*//*
>     *//*SLURMRESTD_DEBUG=4*//*
>     *//*SLURMRESTD_OPTIONS="-f /etc/slurm/slurm.conf"*/
> 
> You can change those as needed. This made it listen on port 8081 only 
> (no socket and not 6820)
> 
> I was then able to just use curl on port 8081 to test things.
> 
> Hope that helps.
> 
> Brian Andrus
> 
> On 12/29/2022 6:49 AM, Chris Stackpole wrote:
>> Greetings,
>>
>> Thanks for responding!
>>
>> On 12/28/22 20:35, Brian Andrus wrote:
>>> I suspect if you delete /var/lib/slurmrestd.socket and then start 
>>> slurmrestd, it will create it as the user you need it to be.
>>>
>>> Or just change the owner of it to the slurmrestd owner.
>>
>>
>> No go on that. Because /var/lib requires root to create 
>> /var/lib/slurmrestd.socket . Which is what I meant by "has to write 
>> into a root-only directory to create the unix socket".
>> Here, I'll show what happens with me.
>> Spun up a virtual machine with nothing changed on a fresh compile of 
>> 22.05.07.
>>
>> # rm -rf /var/lib/slurmrestd.socket
>> # systemctl start slurmrestd
>> # systemctl status slurmrestd
>> <snip>
>> Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
>> 54s ago
>> <snip>
>>
>> # journalctl -xe
>> <snip>
>> Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
>> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
>> socket: Permission denied
>> Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: 
>> Main process exited, code=exited, status=1/FAILURE
>>
>> Now what about giving ownership to the user?
>>
>> # touch /var/lib/slurmrestd.socket
>> # systemctl start slurmrestd
>> # systemctl status slurmrestd
>> <snip>
>> Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
>> 1min 2s ago
>> <snip>
>> # journalctl -xe
>> <snip>
>> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
>> unlink(/var/lib/slurmrestd.socket): Permission denied
>> Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
>> _create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
>> socket: Address already in use
>>
>> Again, it doesn't have permissions to modify those files nor create 
>> files inside that directory.
>>
>> On 12/28/22 20:35, Brian Andrus wrote:
>> > I have been running slurmrestd as a separate user for some time.
>>
>> Under 22.05.07? Because that's what broke things for me. And I think 
>> that it's this change:
>>
>> | -- slurmrestd - switch users earlier on startup to avoid sockets being
>> | made as root.
>>
>> I'm not saying it's a bad change either - but I don't see any 
>> documentation on the proper way to handle it and I don't feel like 
>> editing the service file is the proper way to handle it.
>>
>> Thanks!
>>



More information about the slurm-users mailing list