[slurm-users] Slurm not starting
    Gennaro Oliva 
    oliva.g at na.icar.cnr.it
       
    Mon Jan 15 06:08:55 MST 2018
    
    
  
Ciao Elisabetta,
On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:
> Error messages are not much helping me in guessing what is going on. What
> should I check to get what is failing?
check slurmctld.log and slurmd.log, you can find them under
/var/log/slurm-llnl
> *PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST*
> *batch*       up   infinite      8   unk* node[01-08]*
> 
> 
> Running
> *systemctl status slurmctld.service*
> 
> returns
> 
> *slurmctld.service - Slurm controller daemon*
> *   Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)*
> *   Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET; 41s
> ago*
> *  Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
> (code=exited, status=0/SUCCESS)*
> 
> * slurmctld[2100]: cons_res: select_p_reconfigure*
> * slurmctld[2100]: cons_res: select_p_node_init*
> * slurmctld[2100]: cons_res: preparing for 1 partitions*
> * slurmctld[2100]: Running as primary controller*
> * slurmctld[2100]:
> SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=4,partition_job_depth=0*
> * slurmctld.service start operation timed out. Terminating.*
> *Terminate signal (SIGINT or SIGTERM) received*
> * slurmctld[2100]: Saving all slurm state*
> * Failed to start Slurm controller daemon.*
> * Unit slurmctld.service entered failed state.*
Do you have a backup controller?
Check your slurm.conf under:
/etc/slurm-llnl
Anyway I suggest to update the operating system to stretch and fix your
configuration under a more recent version of slurm.
Best regards
-- 
Gennaro Oliva
    
    
More information about the slurm-users
mailing list