[slurm-users] Slurm not starting
Gennaro Oliva
oliva.g at na.icar.cnr.it
Mon Jan 15 06:08:55 MST 2018
Ciao Elisabetta,
On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:
> Error messages are not much helping me in guessing what is going on. What
> should I check to get what is failing?
check slurmctld.log and slurmd.log, you can find them under
/var/log/slurm-llnl
> *PARTITION AVAIL TIMELIMIT NODES STATE NODELIST*
> *batch* up infinite 8 unk* node[01-08]*
>
>
> Running
> *systemctl status slurmctld.service*
>
> returns
>
> *slurmctld.service - Slurm controller daemon*
> * Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)*
> * Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET; 41s
> ago*
> * Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
> (code=exited, status=0/SUCCESS)*
>
> * slurmctld[2100]: cons_res: select_p_reconfigure*
> * slurmctld[2100]: cons_res: select_p_node_init*
> * slurmctld[2100]: cons_res: preparing for 1 partitions*
> * slurmctld[2100]: Running as primary controller*
> * slurmctld[2100]:
> SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=4,partition_job_depth=0*
> * slurmctld.service start operation timed out. Terminating.*
> *Terminate signal (SIGINT or SIGTERM) received*
> * slurmctld[2100]: Saving all slurm state*
> * Failed to start Slurm controller daemon.*
> * Unit slurmctld.service entered failed state.*
Do you have a backup controller?
Check your slurm.conf under:
/etc/slurm-llnl
Anyway I suggest to update the operating system to stretch and fix your
configuration under a more recent version of slurm.
Best regards
--
Gennaro Oliva
More information about the slurm-users
mailing list