[slurm-users] Slurm not starting

Douglas Jacobsen dmjacobsen at lbl.gov
Mon Jan 15 07:58:27 MST 2018


The fact that sinfo is responding shows that at least slurmctld is
running.  Slumd, on the other hand is not.  Please also get output of
slurmd log or running "slurmd -Dvvv"

On Jan 15, 2018 06:42, "Elisabetta Falivene" <e.falivene at ilabroma.com>
wrote:

> > Anyway I suggest to update the operating system to stretch and fix your
> > configuration under a more recent version of slurm.
>
> I think I'll soon arrive to that :)
> b
>
> 2018-01-15 14:08 GMT+01:00 Gennaro Oliva <oliva.g at na.icar.cnr.it>:
>
>> Ciao Elisabetta,
>>
>> On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:
>> > Error messages are not much helping me in guessing what is going on.
>> What
>> > should I check to get what is failing?
>>
>> check slurmctld.log and slurmd.log, you can find them under
>> /var/log/slurm-llnl
>>
>> > *PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST*
>> > *batch*       up   infinite      8   unk* node[01-08]*
>> >
>> >
>> > Running
>> > *systemctl status slurmctld.service*
>> >
>> > returns
>> >
>> > *slurmctld.service - Slurm controller daemon*
>> > *   Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)*
>> > *   Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET;
>> 41s
>> > ago*
>> > *  Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
>> > (code=exited, status=0/SUCCESS)*
>> >
>> > * slurmctld[2100]: cons_res: select_p_reconfigure*
>> > * slurmctld[2100]: cons_res: select_p_node_init*
>> > * slurmctld[2100]: cons_res: preparing for 1 partitions*
>> > * slurmctld[2100]: Running as primary controller*
>> > * slurmctld[2100]:
>> > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,ma
>> x_sched_time=4,partition_job_depth=0*
>> > * slurmctld.service start operation timed out. Terminating.*
>> > *Terminate signal (SIGINT or SIGTERM) received*
>> > * slurmctld[2100]: Saving all slurm state*
>> > * Failed to start Slurm controller daemon.*
>> > * Unit slurmctld.service entered failed state.*
>>
>> Do you have a backup controller?
>> Check your slurm.conf under:
>> /etc/slurm-llnl
>>
>> Anyway I suggest to update the operating system to stretch and fix your
>> configuration under a more recent version of slurm.
>> Best regards
>> --
>> Gennaro Oliva
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180115/9aabe16c/attachment.html>


More information about the slurm-users mailing list