<div dir="ltr"><span style="font-size:12.8px">> Anyway I suggest to update the operating system to stretch and fix your</span><br style="font-size:12.8px"><span style="font-size:12.8px">> configuration under a more recent version of slurm.</span><br><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">I think I'll soon arrive to that :)</span></div><div><span style="font-size:12.8px">b</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-01-15 14:08 GMT+01:00 Gennaro Oliva <span dir="ltr"><<a href="mailto:oliva.g@na.icar.cnr.it" target="_blank">oliva.g@na.icar.cnr.it</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Ciao Elisabetta,<br>
<span class=""><br>
On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:<br>
> Error messages are not much helping me in guessing what is going on. What<br>
> should I check to get what is failing?<br>
<br>
</span>check slurmctld.log and slurmd.log, you can find them under<br>
/var/log/slurm-llnl<br>
<br>
> *PARTITION AVAIL TIMELIMIT NODES STATE NODELIST*<br>
> *batch* up infinite 8 unk* node[01-08]*<br>
><br>
><br>
> Running<br>
> *systemctl status slurmctld.service*<br>
><br>
> returns<br>
><br>
> *slurmctld.service - Slurm controller daemon*<br>
> * Loaded: loaded (/lib/systemd/system/<wbr>slurmctld.service; enabled)*<br>
> * Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET; 41s<br>
> ago*<br>
> * Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS<br>
> (code=exited, status=0/SUCCESS)*<br>
><br>
> * slurmctld[2100]: cons_res: select_p_reconfigure*<br>
> * slurmctld[2100]: cons_res: select_p_node_init*<br>
> * slurmctld[2100]: cons_res: preparing for 1 partitions*<br>
> * slurmctld[2100]: Running as primary controller*<br>
> * slurmctld[2100]:<br>
> SchedulerParameters=default_<wbr>queue_depth=100,max_rpc_cnt=0,<wbr>max_sched_time=4,partition_<wbr>job_depth=0*<br>
> * slurmctld.service start operation timed out. Terminating.*<br>
> *Terminate signal (SIGINT or SIGTERM) received*<br>
> * slurmctld[2100]: Saving all slurm state*<br>
> * Failed to start Slurm controller daemon.*<br>
> * Unit slurmctld.service entered failed state.*<br>
<br>
Do you have a backup controller?<br>
Check your slurm.conf under:<br>
/etc/slurm-llnl<br>
<br>
Anyway I suggest to update the operating system to stretch and fix your<br>
configuration under a more recent version of slurm.<br>
Best regards<br>
<span class="HOEnZb"><font color="#888888">--<br>
Gennaro Oliva<br>
<br>
</font></span></blockquote></div><br></div>