<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta content="text/html; charset=utf-8">
</head>
<body>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12.0pt; line-height:1.3; color:#1F497D">
<div>Elisabetta-<br>
<br>
Start by focusing on slurmctld. Slurmd not happy without it.<br>
Start it manually in the foreground as in<br>
/usr/sbin/slurmctld -d -vvv<br>
<br>
This assumes slurmd,conf is in default location.<br>
Pardon brevity; on my phone<br>
Jenny Williams<br>
<br>
</div>
<div><br>
</div>
<div id="signature-x" class="signature_editor" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12.0pt; color:#1F497D">
Sent from <a href="http://www.9folders.com/" style="text-decoration:none; color:#009BDF">
Nine</a><br>
</div>
</div>
<hr style="border:none; height:1px; color:#E1E1E1; background-color:#E1E1E1">
<div style="border:none; padding:3.0pt 0cm 0cm 0cm"><span style="font-size:11.0pt; font-family:Calibri,Arial,Helvetica,sans-serif"><b>From:</b> Elisabetta Falivene <e.falivene@ilabroma.com><br>
<b>Sent:</b> Monday, January 15, 2018 7:14 AM<br>
<b>To:</b> Slurm User Community List<br>
<b>Subject:</b> [slurm-users] Slurm not starting<br>
</span></div>
<br type="attribution">
<div>
<div dir="ltr">I did an upgrade from wheezy to jessie (automatically with a normal dist-upgrade) on a cluster with 8 nodes (up, running and reachable) and from slurm <span style="font-family:Helvetica; font-size:12px">2.3.4</span><span style="font-family:Helvetica; font-size:12px"> to
14.03.9</span>. Overcame some problems booting kernel (thank you vey much to Gennaro Oliva, btw), now the system is running correctly with kernel 3.16.0.4, but slurm isn't starting. I tried restarting services, but it seems it isn't able to do it.
<div>
<div>
<div><br>
</div>
<div>Error messages are not much helping me in guessing what is going on. What should I check to get what is failing?</div>
<div><br>
</div>
<div>Thank you </div>
<div>Elisabetta</div>
<div><br>
</div>
<div>PS: Here it is some tests I did</div>
<div><br>
</div>
<div>Running </div>
<div><b>sinfo</b></div>
<div><br>
</div>
<div>returns</div>
<div><br>
</div>
<div><b>PARTITION AVAIL TIMELIMIT NODES STATE NODELIST</b></div>
<div><b>batch* up infinite 8 unk* node[01-08]</b></div>
<div><br>
</div>
<div><br>
</div>
<div>Running </div>
<div>
<div><b>systemctl status slurmctld.service</b></div>
<div><br>
</div>
<div>returns </div>
<div><br>
</div>
<div><b>slurmctld.service - Slurm controller daemon</b></div>
<div><b> Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)</b></div>
<div><b> Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET; 41s ago</b></div>
<div><b> Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)</b></div>
<div><b><br>
</b></div>
<div><b> slurmctld[2100]: cons_res: select_p_reconfigure</b></div>
<div><b> slurmctld[2100]: cons_res: select_p_node_init</b></div>
<div><b> slurmctld[2100]: cons_res: preparing for 1 partitions</b></div>
<div><b> slurmctld[2100]: Running as primary controller</b></div>
<div><b> slurmctld[2100]: SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=4,partition_job_depth=0</b></div>
<div><b> slurmctld.service start operation timed out. Terminating.</b></div>
<div><b>Terminate signal (SIGINT or SIGTERM) received</b></div>
<div><b> slurmctld[2100]: Saving all slurm state</b></div>
<div><b> Failed to start Slurm controller daemon.</b></div>
<div><b> Unit slurmctld.service entered failed state.</b></div>
</div>
<div><b><br>
</b></div>
<div>and running</div>
<div><b><br>
</b></div>
<div>
<div><b>/etc/init.d/slurmd status</b></div>
<div><b><br>
</b></div>
<div>returns</div>
<div><b><br>
</b></div>
<div><b>slurmd.service - Slurm node daemon</b></div>
<div><b> Loaded: loaded (/lib/systemd/system/slurmd.service; enabled)</b></div>
<div><b> Active: failed (Result: exit-code) since Mon 2018-01-15 12:44:52 CET; 21min ago</b></div>
<div><b> Process: 729 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)</b></div>
<div><b><br>
</b></div>
<div><b>slurmd.service: control process exited, code=exited status=1</b></div>
<div><b>systemd[1]: Failed to start Slurm node daemon.</b></div>
<div><b>Unit slurmd.service entered failed state.</b><br>
</div>
<div style="font-weight:bold"><br>
</div>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</body>
</html>