[slurm-users] Slurm not starting

Gennaro Oliva oliva.g at na.icar.cnr.it
Mon Jan 15 06:08:55 MST 2018


Ciao Elisabetta,

On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:
> Error messages are not much helping me in guessing what is going on. What
> should I check to get what is failing?

check slurmctld.log and slurmd.log, you can find them under
/var/log/slurm-llnl

> *PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST*
> *batch*       up   infinite      8   unk* node[01-08]*
> 
> 
> Running
> *systemctl status slurmctld.service*
> 
> returns
> 
> *slurmctld.service - Slurm controller daemon*
> *   Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)*
> *   Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET; 41s
> ago*
> *  Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
> (code=exited, status=0/SUCCESS)*
> 
> * slurmctld[2100]: cons_res: select_p_reconfigure*
> * slurmctld[2100]: cons_res: select_p_node_init*
> * slurmctld[2100]: cons_res: preparing for 1 partitions*
> * slurmctld[2100]: Running as primary controller*
> * slurmctld[2100]:
> SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=4,partition_job_depth=0*
> * slurmctld.service start operation timed out. Terminating.*
> *Terminate signal (SIGINT or SIGTERM) received*
> * slurmctld[2100]: Saving all slurm state*
> * Failed to start Slurm controller daemon.*
> * Unit slurmctld.service entered failed state.*

Do you have a backup controller?
Check your slurm.conf under:
/etc/slurm-llnl

Anyway I suggest to update the operating system to stretch and fix your
configuration under a more recent version of slurm.
Best regards
-- 
Gennaro Oliva



More information about the slurm-users mailing list