[slurm-users] slurmrestd service broken by 22.05.07 update

Chris Stackpole cstackpole at advancedclustering.com
Thu Dec 29 14:49:58 UTC 2022


Greetings,

Thanks for responding!

On 12/28/22 20:35, Brian Andrus wrote:
> I suspect if you delete /var/lib/slurmrestd.socket and then start 
> slurmrestd, it will create it as the user you need it to be.
> 
> Or just change the owner of it to the slurmrestd owner.


No go on that. Because /var/lib requires root to create 
/var/lib/slurmrestd.socket . Which is what I meant by "has to write into 
a root-only directory to create the unix socket".
Here, I'll show what happens with me.
Spun up a virtual machine with nothing changed on a fresh compile of 
22.05.07.

# rm -rf /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd
<snip>
Active: failed (Result: exit-code) since Thu 2022-12-29 08:39:45 CST; 
54s ago
<snip>

# journalctl -xe
<snip>
Dec 29 08:39:45 testslurmvm.cluster slurmrestd[114317]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Permission denied
Dec 29 08:39:45 testslurmvm.cluster systemd[1]: slurmrestd.service: Main 
process exited, code=exited, status=1/FAILURE

Now what about giving ownership to the user?

# touch /var/lib/slurmrestd.socket
# systemctl start slurmrestd
# systemctl status slurmrestd
<snip>
Active: failed (Result: exit-code) since Thu 2022-12-29 08:45:37 CST; 
1min 2s ago
<snip>
# journalctl -xe
<snip>
Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: error: Error 
unlink(/var/lib/slurmrestd.socket): Permission denied
Dec 29 08:45:37 testslurmvm.cluster slurmrestd[114402]: fatal: 
_create_socket: [unix:/var/lib/slurmrestd.socket] Unable to bind UNIX 
socket: Address already in use

Again, it doesn't have permissions to modify those files nor create 
files inside that directory.

On 12/28/22 20:35, Brian Andrus wrote:
 > I have been running slurmrestd as a separate user for some time.

Under 22.05.07? Because that's what broke things for me. And I think 
that it's this change:

| -- slurmrestd - switch users earlier on startup to avoid sockets being
| made as root.

I'm not saying it's a bad change either - but I don't see any 
documentation on the proper way to handle it and I don't feel like 
editing the service file is the proper way to handle it.

Thanks!



More information about the slurm-users mailing list