[slurm-users] Problem with permisions. CentOS 7.8

Tue Jun 2 09:58:35 UTC 2020

Hi!

I did a fresh installation with the EPEL repo, and installing munge from it and it worked. To have the slurm user for munge was definitely a problem, but that is the set up we have on the CentOS 6. Now I've learnt my lesson for future installations, thanks to everyone!

Now, I have a follow up question, if you don't mind. I am now trying to run slurm, and it crashes:

[root at roos21 ~]# systemctl status slurm.service

● slurm.service - LSB: slurm daemon management

   Loaded: loaded (/etc/rc.d/init.d/slurm; bad; vendor preset: disabled)

   Active: failed (Result: protocol) since Tue 2020-06-02 11:45:33 CEST; 3min 33s ago

     Docs: man:systemd-sysv-generator(8)

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Starting LSB: slurm daemon management...

Jun 02 11:45:33 roos21.organ.su.se slurm[18223]: starting slurmd: [  OK  ]

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Can't open PID file /var/run/slurmctld.pid (yet?) after start: No such file or directory

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Failed to start LSB: slurm daemon management.

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Unit slurm.service entered failed state.

Jun 02 11:45:33 roos21.organ.su.se systemd[1]: slurm.service failed.

The thing is that this is a computing node, not the master node, so slurmctld is not installed. Why do I get this error?

Many thanks, and my apologies for this rather simple questions. I am a newbie on this.

Best,

Ferran

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Renata Maria Dart <renata at slac.stanford.edu>
Sent: Friday, May 29, 2020 6:33:58 PM
To: Ole.H.Nielsen at fysik.dtu.dk; Slurm User Community List
Subject: Re: [slurm-users] Problem with permisions. CentOS 7.8

Hi, don't know if this might be your problem but I ran into an issue
on centos 7.8 where /var/run/munge was not being created at boottime
because I didn't have the munge user in the local password file.  I
have the munge user in AD and once the system is up I can start munge
successfully, but AD wasn't available early enough during boot for the
munge startup to see it.  I added these lines to the munge systemctl
file:

PermissionsStartOnly=true
ExecStartPre=-/usr/bin/mkdir -m 0755 -p /var/run/munge
ExecStartPre=-/usr/bin/chown -R munge:munge /var/run/munge

and my system now starts munge up fine during a reboot.

Renata

On Fri, 29 May 2020, Ole Holm Nielsen wrote:

> Hi Ferran,
>
> When you have a CentOS 7 system with the EPEL repo enabled, and you have
> installed the munge RPM from EPEL, then things should be working correctly.
>
> Since systemctl tells you that Munge service didn't start correctly, then it
> seems to me that you have a problem in the general configuration of your CentOS
> 7 system.  You should check /var/log/messages and "journalctl -xe" for munge
> errors.  It is really hard for other people to guess what may be wrong in your
> system.
>
> My 2 cents worth: Maybe you could make a fresh CentOS 7.8 installation on a
> test system and install the Munge service (and nothing else) according to
> instructions in https://wiki.fysik.dtu.dk/niflheim/Slurm_installation.  This
> *really* has got to work!
>
> /Ole
>
>
> On 29-05-2020 10:23, Ferran Planas Padros wrote:
>> Hello everyone,
>>
>>
>> Here it comes everything I've done.
>>
>>
>> - About Ole's answer:
>>
>> Yes, we have slurm as the user to control munge. Following your comment, I
>> have changed the ownership of the munge files and tried to start munge as
>> munge user. However, it also failed.
>>
>> Also, I first installed munge from a repository. I've seen your suggestion of
>> installing from EPEL. So I uninstalled and installed again. Same result
>>
>> - About SELinux: It is disables
>>
>> - The output of ps -ef | grep munge is:
>>
>>
>> root534051530 10:18 pts/000:00:00 grep --color=auto *munge*
>>
>>
>> - The outputs of munge -n is:
>>
>>
>> Failed to access "/var/run/munge/munge.socket.2": No such file or directory
>>
>>
>> - Same for unmunge
>>
>>
>> - Output for sudo systemctl status --full munge
>>
>>
>> *?*munge.service - MUNGE authentication service
>>
>> Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor preset:
>> disabled)
>>
>> Active: *failed*(Result: exit-code) since Fri 2020-05-29 10:15:52 CEST; 4min
>> 18s ago
>>
>> Docs: man:munged(8)
>>
>> Process: 5333 ExecStart=/usr/sbin/munged *(code=exited, status=1/FAILURE)*
>>
>>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: Starting MUNGE authentication
>> service...
>>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service: control process
>> exited, code=exited status=1*
>>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *Failed to start MUNGE
>> authentication service.*
>>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *Unit munge.service entered
>> failed state.*
>>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service failed.*
>>
>>
>> - Regarding NTP, I get this message:
>>
>>
>> Unable to talk to NTP daemon. Is it running?
>>
>>
>> It is the same message I get in the nodes that DO work. All nodes are sync in
>> time and date with the central node
>>
>>
>> ------------------------------------------------------------------------
>> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Ole
>> Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
>> *Sent:* Friday, May 29, 2020 9:56:10 AM
>> *To:* slurm-users at lists.schedmd.com
>> *Subject:* Re: [slurm-users] Problem with permisions. CentOS 7.8
>> On 29-05-2020 08:46, Sudeep Narayan Banerjee wrote:
>>> also check:
>>> a) whether NTP has been setup and communicating with master node
>>> b) iptables may be flushed (iptables -L)
>>> c) SeLinux to disabled, to check :
>>> getenforce
>>> vim /etc/sysconfig/selinux
>>> (change SELINUX=enforcing to SELINUX=disabled and save the file and reboot)
>>
>> There is no reason to disable SELinux for running the Munge service.
>> It's a pretty bad idea to lower the security just for the sake of
>> convenience!
>>
>> /Ole
>>
>>
>>> On Fri, May 29, 2020 at 12:08 PM Sudeep Narayan Banerjee
>>> <snbanerjee at iitgn.ac.in <mailto:snbanerjee at iitgn.ac.in>> wrote:
>>>
>>>      I have not checked on the CentOS7.8
>>>      a) if /var/run/munge folder does not exist then please double check
>>>      whether munge has been installed or not
>>>      b) user root or sudo user to do
>>>      ps -ef | grep munge
>>>      kill -9 <PID> //where PID is the Process ID for munge (if the
>>>      process is running at all); else
>>>
>>>      which munged
>>>      /etc/init.d/munge start
>>>
>>>      please let me know the the output of:
>>>
>>>      |$ munge -n|
>>>
>>>      |$ munge -n | unmunge|
>>>
>>>      |$ sudo systemctl status --full munge
>>>
>>>      |
>>>
>>>      Thanks & Regards,
>>>      Sudeep Narayan Banerjee
>>>      System Analyst | Scientist B
>>>      Indian Institute of Technology Gandhinagar
>>>      Gujarat, INDIA
>>>
>>>
>>>      On Fri, May 29, 2020 at 11:55 AM Bjørn-Helge Mevik
>>>      <b.h.mevik at usit.uio.no <mailto:b.h.mevik at usit.uio.no>> wrote:
>>>
>>>          Ferran Planas Padros <ferran.padros at su.se
>>>          <mailto:ferran.padros at su.se>> writes:
>>>
>>>           > I run the command as slurm user, and the /var/log/munge
>>>          folder does belong to slurm.
>>>
>>>          For security reasons, I strongly advise that you run munged as a
>>>          separate user, which is unprivileged and not used for anything else.
>>>
>>>          --          Regards,
>>>          Bjørn-Helge Mevik, dr. scient,
>>>          Department for Research Computing, University of Oslo
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200602/11c6740c/attachment-0001.htm>