<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ks_c_5601-1987">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p><br>
</p>
<meta content="text/html; charset=UTF-8">
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, Helvetica, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<p>Hi!</p>
<p><br>
</p>
<p>I did a fresh installation with the EPEL repo, and installing munge from it and it worked. To have the slurm user for munge was definitely a problem, but that is the set up we have on the CentOS 6. Now I've learnt my lesson for future installations, thanks
to everyone!</p>
<p><br>
</p>
<p>Now, I have a follow up question, if you don't mind. I am now trying to run slurm, and it crashes:</p>
<p><br>
</p>
<p></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;">[root@roos21 ~]# systemctl status slurm.service</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s2" style="font-variant-ligatures: no-common-ligatures; color: rgb(195, 55, 32);"><b>¡Ü</b></span><span class="s1" style="font-variant-ligatures: no-common-ligatures;"> slurm.service - LSB: slurm daemon management</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space">
</span>Loaded: loaded (/etc/rc.d/init.d/slurm; bad; vendor preset: disabled)</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space">
</span>Active: </span><span class="s2" style="font-variant-ligatures: no-common-ligatures; color: rgb(195, 55, 32);"><b>failed</b></span><span class="s1" style="font-variant-ligatures: no-common-ligatures;"> (Result: protocol) since Tue 2020-06-02 11:45:33
CEST; 3min 33s ago</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space">
</span>Docs: man:systemd-sysv-generator(8)</span></p>
<p class="p2" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;"></span><br>
</p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;">Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Starting LSB: slurm daemon management...</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;">Jun 02 11:45:33 roos21.organ.su.se slurm[18223]: starting slurmd: [<span class="Apple-converted-space">
</span>OK<span class="Apple-converted-space"> </span>]</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;">Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Can't open PID file /var/run/slurmctld.pid (yet?) after start: No such file or directory</span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;">Jun 02 11:45:33 roos21.organ.su.se systemd[1]:
</span><span class="s2" style="font-variant-ligatures: no-common-ligatures; color: rgb(195, 55, 32);"><b>Failed to start LSB: slurm daemon management.</b></span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;">Jun 02 11:45:33 roos21.organ.su.se systemd[1]:
<b>Unit slurm.service entered failed state.</b></span></p>
<p class="p1" style="margin-right: 0px; margin-left: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;">
<span class="s1" style="font-variant-ligatures: no-common-ligatures;">Jun 02 11:45:33 roos21.organ.su.se systemd[1]:
<b>slurm.service failed.</b></span></p>
<br>
<p></p>
<p><br>
</p>
<p>The thing is that this is a computing node, not the master node, so slurmctld is not installed. Why do I get this error?</p>
<p><br>
</p>
<p>Many thanks, and my apologies for this rather simple questions. I am a newbie on this.</p>
<p><br>
</p>
<p>Best,</p>
<p>Ferran</p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Renata Maria Dart <renata@slac.stanford.edu><br>
<b>Sent:</b> Friday, May 29, 2020 6:33:58 PM<br>
<b>To:</b> Ole.H.Nielsen@fysik.dtu.dk; Slurm User Community List<br>
<b>Subject:</b> Re: [slurm-users] Problem with permisions. CentOS 7.8</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt">
<div class="PlainText">Hi, don't know if this might be your problem but I ran into an issue<br>
on centos 7.8 where /var/run/munge was not being created at boottime<br>
because I didn't have the munge user in the local password file. I<br>
have the munge user in AD and once the system is up I can start munge<br>
successfully, but AD wasn't available early enough during boot for the<br>
munge startup to see it. I added these lines to the munge systemctl<br>
file:<br>
<br>
PermissionsStartOnly=true<br>
ExecStartPre=-/usr/bin/mkdir -m 0755 -p /var/run/munge<br>
ExecStartPre=-/usr/bin/chown -R munge:munge /var/run/munge<br>
<br>
and my system now starts munge up fine during a reboot.<br>
<br>
Renata<br>
<br>
On Fri, 29 May 2020, Ole Holm Nielsen wrote:<br>
<br>
> Hi Ferran,<br>
><br>
> When you have a CentOS 7 system with the EPEL repo enabled, and you have<br>
> installed the munge RPM from EPEL, then things should be working correctly.<br>
><br>
> Since systemctl tells you that Munge service didn't start correctly, then it<br>
> seems to me that you have a problem in the general configuration of your CentOS<br>
> 7 system. You should check /var/log/messages and "journalctl -xe" for munge<br>
> errors. It is really hard for other people to guess what may be wrong in your<br>
> system.<br>
><br>
> My 2 cents worth: Maybe you could make a fresh CentOS 7.8 installation on a<br>
> test system and install the Munge service (and nothing else) according to<br>
> instructions in <a href="https://wiki.fysik.dtu.dk/niflheim/Slurm_installation" id="LPlnk352298" previewremoved="true">
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation</a>. This<br>
> *really* has got to work!<br>
><br>
> /Ole<br>
><br>
><br>
> On 29-05-2020 10:23, Ferran Planas Padros wrote:<br>
>> Hello everyone,<br>
>><br>
>><br>
>> Here it comes everything I've done.<br>
>><br>
>><br>
>> - About Ole's answer:<br>
>><br>
>> Yes, we have slurm as the user to control munge. Following your comment, I<br>
>> have changed the ownership of the munge files and tried to start munge as<br>
>> munge user. However, it also failed.<br>
>><br>
>> Also, I first installed munge from a repository. I've seen your suggestion of<br>
>> installing from EPEL. So I uninstalled and installed again. Same result<br>
>><br>
>> - About SELinux: It is disables<br>
>><br>
>> - The output of ps -ef | grep munge is:<br>
>><br>
>><br>
>> root534051530 10:18 pts/000:00:00 grep --color=auto *munge*<br>
>><br>
>><br>
>> - The outputs of munge -n is:<br>
>><br>
>><br>
>> Failed to access "/var/run/munge/munge.socket.2": No such file or directory<br>
>><br>
>><br>
>> - Same for unmunge<br>
>><br>
>><br>
>> - Output for sudo systemctl status --full munge<br>
>><br>
>><br>
>> *?*munge.service - MUNGE authentication service<br>
>><br>
>> Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor preset:<br>
>> disabled)<br>
>><br>
>> Active: *failed*(Result: exit-code) since Fri 2020-05-29 10:15:52 CEST; 4min<br>
>> 18s ago<br>
>><br>
>> Docs: man:munged(8)<br>
>><br>
>> Process: 5333 ExecStart=/usr/sbin/munged *(code=exited, status=1/FAILURE)*<br>
>><br>
>><br>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: Starting MUNGE authentication<br>
>> service...<br>
>><br>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service: control process<br>
>> exited, code=exited status=1*<br>
>><br>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *Failed to start MUNGE<br>
>> authentication service.*<br>
>><br>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *Unit munge.service entered<br>
>> failed state.*<br>
>><br>
>> May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service failed.*<br>
>><br>
>><br>
>> - Regarding NTP, I get this message:<br>
>><br>
>><br>
>> Unable to talk to NTP daemon. Is it running?<br>
>><br>
>><br>
>> It is the same message I get in the nodes that DO work. All nodes are sync in<br>
>> time and date with the central node<br>
>><br>
>><br>
>> ------------------------------------------------------------------------<br>
>> *From:* slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Ole<br>
>> Holm Nielsen <Ole.H.Nielsen@fysik.dtu.dk><br>
>> *Sent:* Friday, May 29, 2020 9:56:10 AM<br>
>> *To:* slurm-users@lists.schedmd.com<br>
>> *Subject:* Re: [slurm-users] Problem with permisions. CentOS 7.8<br>
>> On 29-05-2020 08:46, Sudeep Narayan Banerjee wrote:<br>
>>> also check:<br>
>>> a) whether NTP has been setup and communicating with master node<br>
>>> b) iptables may be flushed (iptables -L)<br>
>>> c) SeLinux to disabled, to check :<br>
>>> getenforce<br>
>>> vim /etc/sysconfig/selinux<br>
>>> (change SELINUX=enforcing to SELINUX=disabled and save the file and reboot)<br>
>><br>
>> There is no reason to disable SELinux for running the Munge service.<br>
>> It's a pretty bad idea to lower the security just for the sake of<br>
>> convenience!<br>
>><br>
>> /Ole<br>
>><br>
>><br>
>>> On Fri, May 29, 2020 at 12:08 PM Sudeep Narayan Banerjee<br>
>>> <snbanerjee@iitgn.ac.in <<a href="mailto:snbanerjee@iitgn.ac.in">mailto:snbanerjee@iitgn.ac.in</a>>> wrote:<br>
>>><br>
>>> I have not checked on the CentOS7.8<br>
>>> a) if /var/run/munge folder does not exist then please double check<br>
>>> whether munge has been installed or not<br>
>>> b) user root or sudo user to do<br>
>>> ps -ef | grep munge<br>
>>> kill -9 <PID> //where PID is the Process ID for munge (if the<br>
>>> process is running at all); else<br>
>>><br>
>>> which munged<br>
>>> /etc/init.d/munge start<br>
>>><br>
>>> please let me know the the output of:<br>
>>><br>
>>> |$ munge -n|<br>
>>><br>
>>> |$ munge -n | unmunge|<br>
>>><br>
>>> |$ sudo systemctl status --full munge<br>
>>><br>
>>> |<br>
>>><br>
>>> Thanks & Regards,<br>
>>> Sudeep Narayan Banerjee<br>
>>> System Analyst | Scientist B<br>
>>> Indian Institute of Technology Gandhinagar<br>
>>> Gujarat, INDIA<br>
>>><br>
>>><br>
>>> On Fri, May 29, 2020 at 11:55 AM Bj©ªrn-Helge Mevik<br>
>>> <b.h.mevik@usit.uio.no <<a href="mailto:b.h.mevik@usit.uio.no">mailto:b.h.mevik@usit.uio.no</a>>> wrote:<br>
>>><br>
>>> Ferran Planas Padros <ferran.padros@su.se<br>
>>> <<a href="mailto:ferran.padros@su.se">mailto:ferran.padros@su.se</a>>> writes:<br>
>>><br>
>>> > I run the command as slurm user, and the /var/log/munge<br>
>>> folder does belong to slurm.<br>
>>><br>
>>> For security reasons, I strongly advise that you run munged as a<br>
>>> separate user, which is unprivileged and not used for anything else.<br>
>>><br>
>>> -- Regards,<br>
>>> Bj©ªrn-Helge Mevik, dr. scient,<br>
>>> Department for Research Computing, University of Oslo<br>
>>><br>
><br>
</div>
</span></font></div>
</body>
</html>