[slurm-users] Slurm not starting

Elisabetta Falivene e.falivene at ilabroma.com
Mon Jan 15 08:30:00 MST 2018


slurmd -Dvvv says

slurmd: fatal: Unable to determine this slurmd's NodeName

b

2018-01-15 15:58 GMT+01:00 Douglas Jacobsen <dmjacobsen at lbl.gov>:

> The fact that sinfo is responding shows that at least slurmctld is
> running.  Slumd, on the other hand is not.  Please also get output of
> slurmd log or running "slurmd -Dvvv"
>




>
> On Jan 15, 2018 06:42, "Elisabetta Falivene" <e.falivene at ilabroma.com>
> wrote:
>
>> > Anyway I suggest to update the operating system to stretch and fix your
>> > configuration under a more recent version of slurm.
>>
>> I think I'll soon arrive to that :)
>> b
>>
>> 2018-01-15 14:08 GMT+01:00 Gennaro Oliva <oliva.g at na.icar.cnr.it>:
>>
>>> Ciao Elisabetta,
>>>
>>> On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:
>>> > Error messages are not much helping me in guessing what is going on.
>>> What
>>> > should I check to get what is failing?
>>>
>>> check slurmctld.log and slurmd.log, you can find them under
>>> /var/log/slurm-llnl
>>>
>>> > *PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST*
>>> > *batch*       up   infinite      8   unk* node[01-08]*
>>> >
>>> >
>>> > Running
>>> > *systemctl status slurmctld.service*
>>> >
>>> > returns
>>> >
>>> > *slurmctld.service - Slurm controller daemon*
>>> > *   Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)*
>>> > *   Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39
>>> CET; 41s
>>> > ago*
>>> > *  Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
>>> > (code=exited, status=0/SUCCESS)*
>>> >
>>> > * slurmctld[2100]: cons_res: select_p_reconfigure*
>>> > * slurmctld[2100]: cons_res: select_p_node_init*
>>> > * slurmctld[2100]: cons_res: preparing for 1 partitions*
>>> > * slurmctld[2100]: Running as primary controller*
>>> > * slurmctld[2100]:
>>> > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,ma
>>> x_sched_time=4,partition_job_depth=0*
>>> > * slurmctld.service start operation timed out. Terminating.*
>>> > *Terminate signal (SIGINT or SIGTERM) received*
>>> > * slurmctld[2100]: Saving all slurm state*
>>> > * Failed to start Slurm controller daemon.*
>>> > * Unit slurmctld.service entered failed state.*
>>>
>>> Do you have a backup controller?
>>> Check your slurm.conf under:
>>> /etc/slurm-llnl
>>>
>>> Anyway I suggest to update the operating system to stretch and fix your
>>> configuration under a more recent version of slurm.
>>> Best regards
>>> --
>>> Gennaro Oliva
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180115/fc613150/attachment.html>


More information about the slurm-users mailing list