[slurm-users] Slurm not starting
Carlos Fenoy
minibit at gmail.com
Mon Jan 15 08:43:11 MST 2018
Are you trying to start the slurmd in the headnode or a compute node?
Can you provide the slurm.conf file?
Regards,
Carlos
On Mon, Jan 15, 2018 at 4:30 PM, Elisabetta Falivene <
e.falivene at ilabroma.com> wrote:
> slurmd -Dvvv says
>
> slurmd: fatal: Unable to determine this slurmd's NodeName
>
> b
>
> 2018-01-15 15:58 GMT+01:00 Douglas Jacobsen <dmjacobsen at lbl.gov>:
>
>> The fact that sinfo is responding shows that at least slurmctld is
>> running. Slumd, on the other hand is not. Please also get output of
>> slurmd log or running "slurmd -Dvvv"
>>
>
>
>
>
>>
>> On Jan 15, 2018 06:42, "Elisabetta Falivene" <e.falivene at ilabroma.com>
>> wrote:
>>
>>> > Anyway I suggest to update the operating system to stretch and fix your
>>> > configuration under a more recent version of slurm.
>>>
>>> I think I'll soon arrive to that :)
>>> b
>>>
>>> 2018-01-15 14:08 GMT+01:00 Gennaro Oliva <oliva.g at na.icar.cnr.it>:
>>>
>>>> Ciao Elisabetta,
>>>>
>>>> On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:
>>>> > Error messages are not much helping me in guessing what is going on.
>>>> What
>>>> > should I check to get what is failing?
>>>>
>>>> check slurmctld.log and slurmd.log, you can find them under
>>>> /var/log/slurm-llnl
>>>>
>>>> > *PARTITION AVAIL TIMELIMIT NODES STATE NODELIST*
>>>> > *batch* up infinite 8 unk* node[01-08]*
>>>> >
>>>> >
>>>> > Running
>>>> > *systemctl status slurmctld.service*
>>>> >
>>>> > returns
>>>> >
>>>> > *slurmctld.service - Slurm controller daemon*
>>>> > * Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled)*
>>>> > * Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39
>>>> CET; 41s
>>>> > ago*
>>>> > * Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
>>>> > (code=exited, status=0/SUCCESS)*
>>>> >
>>>> > * slurmctld[2100]: cons_res: select_p_reconfigure*
>>>> > * slurmctld[2100]: cons_res: select_p_node_init*
>>>> > * slurmctld[2100]: cons_res: preparing for 1 partitions*
>>>> > * slurmctld[2100]: Running as primary controller*
>>>> > * slurmctld[2100]:
>>>> > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,ma
>>>> x_sched_time=4,partition_job_depth=0*
>>>> > * slurmctld.service start operation timed out. Terminating.*
>>>> > *Terminate signal (SIGINT or SIGTERM) received*
>>>> > * slurmctld[2100]: Saving all slurm state*
>>>> > * Failed to start Slurm controller daemon.*
>>>> > * Unit slurmctld.service entered failed state.*
>>>>
>>>> Do you have a backup controller?
>>>> Check your slurm.conf under:
>>>> /etc/slurm-llnl
>>>>
>>>> Anyway I suggest to update the operating system to stretch and fix your
>>>> configuration under a more recent version of slurm.
>>>> Best regards
>>>> --
>>>> Gennaro Oliva
>>>>
>>>>
>>>
>
--
--
Carles Fenoy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180115/16756fc1/attachment-0001.html>
More information about the slurm-users
mailing list