[slurm-users] unable to start slurmd process.
navin srivastava
navin.altair at gmail.com
Thu Jun 11 13:25:48 UTC 2020
Sorry Andy I missed to add.
1st i tried the slurmd -Dvvv and it is not written anything
slurmd: debug: Log file re-opened
slurmd: debug: Munge authentication plugin loaded
After that I waited for 10-20 minutes but no output and finally i pressed
Ctrl^c.
My doubt is in slurm.conf file:
ControlMachine=deda1x1466
ControlAddr=192.168.150.253
The deda1x1466 is having a different interface with different IP which
compute node is unable to ping but IP is pingable.
could be one of the reason?
but other nodes having the same config and there i am able to start the
slurmd. so bit of confusion.
Regards
Navin.
Regards
Navin.
On Thu, Jun 11, 2020 at 6:44 PM Riebs, Andy <andy.riebs at hpe.com> wrote:
> If you omitted the “-D” that I suggested, then the daemon would have
> detached and logged nothing on the screen. In this case, you can still go
> to the slurmd log (use “scontrol show config | grep -I log” if you’re not
> sure where the logs are stored).
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *navin srivastava
> *Sent:* Thursday, June 11, 2020 9:01 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] unable to start slurmd process.
>
>
>
> I tried by executing the debug mode but there also it is not writing
> anything.
>
>
>
> i waited for about 5-10 minutes
>
>
>
> deda1x1452:/etc/sysconfig # /usr/sbin/slurmd -v -v
>
> No output on terminal.
>
>
>
> The OS is SLES12-SP4 . All firewall services are disabled.
>
>
>
> The recent change is the local hostname earlier it was with local hostname
> node1,node2,etc but we have moved to dns based hostname which is deda
>
>
>
> NodeName=node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=node[1-12]
> Sockets=2 CoresPerSocket=10 State=UNKNOWN
>
> other than this it is fine but after that i have done several time slurmd
> process started on the node and it works fine but now i am seeing this
> issue today.
>
>
>
> Regards
>
> Navin.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Jun 11, 2020 at 6:06 PM Riebs, Andy <andy.riebs at hpe.com> wrote:
>
> Navin,
>
>
>
> As you can see, systemd provides very little service-specific information.
> For slurm, you really need to go to the slurm logs to find out what
> happened.
>
>
>
> Hint: A quick way to identify problems like this with slurmd and slurmctld
> is to run them with the “-Dvvv” option, causing them to log to your window,
> and usually causing the problem to become immediately obvious.
>
>
>
> For example,
>
>
>
> # /usr/local/slurm/sbin/slurmd -Dvvvv
>
>
>
> Just it ^C when you’re done, if necessary. Of course, if it doesn’t fail
> when you run it this way, it’s time to look elsewhere.
>
>
>
> Andy
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *navin srivastava
> *Sent:* Thursday, June 11, 2020 8:25 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* [slurm-users] unable to start slurmd process.
>
>
>
> Hi Team,
>
>
>
> when i am trying to start the slurmd process i am getting the below error.
>
>
>
> 2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node
> daemon...
> 2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start
> operation timed out. Terminating.
> 2020-06-11T13:13:28.684479+02:00 oled3 systemd[1]: Failed to start Slurm
> node daemon.
> 2020-06-11T13:13:28.684759+02:00 oled3 systemd[1]: slurmd.service: Unit
> entered failed state.
> 2020-06-11T13:13:28.684917+02:00 oled3 systemd[1]: slurmd.service: Failed
> with result 'timeout'.
> 2020-06-11T13:15:01.437172+02:00 oled3 cron[8094]:
> pam_unix(crond:session): session opened for user root by (uid=0)
>
>
>
> Slurm version is 17.11.8
>
>
>
> The server and slurm is running from long time and we have not made any
> changes but today when i am starting it is giving this error message.
>
> Any idea what could be wrong here.
>
>
>
> Regards
>
> Navin.
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200611/d2322854/attachment.htm>
More information about the slurm-users
mailing list