<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">What does your ‘slurmctld.service’ look like? You might want to add something to the ‘After=’ section if your service is starting too quickly.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">e.g. we use ‘After=network.target munge.service’ (<a href="https://github.com/NVIDIA/nephele-packages/blob/30bc321c311398cc7a86485bc88930e4b6790fb4/slurm/debian/PACKAGE-control.slurmctld.service#L3">see here</a>).
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com>
<b>On Behalf Of </b>Alpha Experiment<br>
<b>Sent:</b> Monday, December 14, 2020 4:20 PM<br>
<b>To:</b> slurm-users@lists.schedmd.com<br>
<b>Subject:</b> [slurm-users] slurmctld daemon error<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<table class="MsoNormalTable" border="1" cellspacing="3" cellpadding="0" style="background:#FFEB9C">
<tbody>
<tr>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><b><span style="font-size:7.5pt;font-family:"Verdana",sans-serif;color:black">External email: Use caution opening links or attachments</span></b><span style="font-size:7.5pt;font-family:"Verdana",sans-serif;color:black">
</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Hi, <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I am trying to run slurm on Fedora 33. Upon boot the slurmd daemon is running correctly; however the slurmctld daemon always errors.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New"">[admin@localhost ~]$ systemctl status slurmd.service
<br>
● slurmd.service - Slurm node daemon<br>
Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor preset: disabled)<br>
Active: active (running) since Mon 2020-12-14 16:02:18 PST; 11min ago<br>
Main PID: 2363 (slurmd)<br>
Tasks: 2<br>
Memory: 3.4M<br>
CPU: 211ms<br>
CGroup: /system.slice/slurmd.service<br>
└─2363 /usr/local/sbin/slurmd -D<br>
Dec 14 16:02:18 localhost.localdomain systemd[1]: Started Slurm node daemon.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New"">[admin@localhost ~]$ systemctl status slurmctld.service
<br>
● slurmctld.service - Slurm controller daemon<br>
Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)<br>
Drop-In: /etc/systemd/system/slurmctld.service.d<br>
└─override.conf<br>
Active: failed (Result: exit-code) since Mon 2020-12-14 16:02:12 PST; 11min ago<br>
Process: 1972 ExecStart=/usr/local/sbin/slurmctld -D $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)<br>
Main PID: 1972 (code=exited, status=1/FAILURE)<br>
CPU: 21ms<br>
Dec 14 16:02:12 localhost.localdomain systemd[1]: Started Slurm controller daemon.<br>
Dec 14 16:02:12 localhost.localdomain systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE<br>
Dec 14 16:02:12 localhost.localdomain systemd[1]: slurmctld.service: Failed with result 'exit-code'.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">The slurmctld log is as follows:<o:p></o:p></p>
</div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New"">[2020-12-14T16:02:12.731] slurmctld version 20.11.1 started on cluster cluster<br>
[2020-12-14T16:02:12.739] No memory enforcing mechanism configured.<br>
[2020-12-14T16:02:12.772] error: get_addr_info: getaddrinfo() failed: Name or service not known<br>
[2020-12-14T16:02:12.772] error: slurm_set_addr: Unable to resolve "localhost"<br>
[2020-12-14T16:02:12.772] error: slurm_get_port: Address family '0' not supported<br>
[2020-12-14T16:02:12.772] error: _set_slurmd_addr: failure on localhost<br>
[2020-12-14T16:02:12.772] Recovered state of 1 nodes<br>
[2020-12-14T16:02:12.772] Recovered information about 0 jobs<br>
[2020-12-14T16:02:12.772] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions<br>
[2020-12-14T16:02:12.779] Recovered state of 0 reservations<br>
[2020-12-14T16:02:12.779] read_slurm_conf: backup_controller not specified<br>
[2020-12-14T16:02:12.779] select/cons_tres: select_p_reconfigure: select/cons_tres: reconfigure<br>
[2020-12-14T16:02:12.779] select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions<br>
[2020-12-14T16:02:12.779] Running as primary controller<br>
[2020-12-14T16:02:12.780] No parameter for mcs plugin, default values set<br>
[2020-12-14T16:02:12.780] mcs: MCSParameters = (null). ondemand set.<br>
[2020-12-14T16:02:12.780] error: get_addr_info: getaddrinfo() failed: Name or service not known<br>
[2020-12-14T16:02:12.780] error: slurm_set_addr: Unable to resolve "(null)"<br>
[2020-12-14T16:02:12.780] error: slurm_set_port: attempting to set port without address family<br>
[2020-12-14T16:02:12.782] error: Error creating slurm stream socket: Address family not supported by protocol</span><o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New"">[2020-12-14T16:02:12.782] fatal: slurm_init_msg_engine_port error Address family not supported by protocol </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Strangely, the daemon works fine when it is rebooted. After running<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New"">systemctl restart slurmctld.service</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">the service status is<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New"">[admin@localhost ~]$ systemctl status slurmctld.service </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New"">● slurmctld.service - Slurm controller daemon<br>
Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)<br>
Drop-In: /etc/systemd/system/slurmctld.service.d<br>
└─override.conf<br>
Active: active (running) since Mon 2020-12-14 16:14:24 PST; 3s ago<br>
Main PID: 2815 (slurmctld)<br>
Tasks: 7<br>
Memory: 1.9M<br>
CPU: 15ms<br>
CGroup: /system.slice/slurmctld.service<br>
└─2815 /usr/local/sbin/slurmctld -D<br>
Dec 14 16:14:24 localhost.localdomain systemd[1]: Started Slurm controller daemon.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Could anyone point me towards how to fix this? I expect it's just an issue with my configuration file, which I've copied below for reference.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:7.5pt;font-family:"Courier New""># slurm.conf file generated by configurator easy.html.<br>
# Put this file on all nodes of your cluster.<br>
# See the slurm.conf man page for more information.<br>
#<br>
#SlurmctldHost=localhost<br>
ControlMachine=localhost<br>
#<br>
#MailProg=/bin/mail<br>
MpiDefault=none<br>
#MpiParams=ports=#-#<br>
ProctrackType=proctrack/cgroup<br>
ReturnToService=1<br>
SlurmctldPidFile=/home/slurm/run/slurmctld.pid<br>
#SlurmctldPort=6817<br>
SlurmdPidFile=/home/slurm/run/slurmd.pid<br>
#SlurmdPort=6818<br>
SlurmdSpoolDir=/var/spool/slurm/slurmd/<br>
SlurmUser=slurm<br>
#SlurmdUser=root<br>
StateSaveLocation=/home/slurm/spool/<br>
SwitchType=switch/none<br>
TaskPlugin=task/affinity<br>
#<br>
#<br>
# TIMERS<br>
#KillWait=30<br>
#MinJobAge=300<br>
#SlurmctldTimeout=120<br>
#SlurmdTimeout=300<br>
#<br>
#<br>
# SCHEDULING<br>
SchedulerType=sched/backfill<br>
SelectType=select/cons_tres<br>
SelectTypeParameters=CR_Core<br>
#<br>
#<br>
# LOGGING AND ACCOUNTING<br>
AccountingStorageType=accounting_storage/none<br>
ClusterName=cluster<br>
#JobAcctGatherFrequency=30<br>
JobAcctGatherType=jobacct_gather/none<br>
#SlurmctldDebug=info<br>
SlurmctldLogFile=/home/slurm/log/slurmctld.log<br>
#SlurmdDebug=info<br>
#SlurmdLogFile=<br>
#<br>
#<br>
# COMPUTE NODES<br>
NodeName=localhost CPUs=128 RealMemory=257682 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN<br>
PartitionName=full Nodes=localhost Default=YES MaxTime=INFINITE State=UP</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks!<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">-John<o:p></o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>