[slurm-users] How to fix “slurmd.service: Can't open PID file” error

Noki Lee noki.lee21 at gmail.com
Wed Jun 19 02:55:08 UTC 2019


Hi, slurm-users and mercan.

I tried what you said.

noki at noki-System-Product-Name:~$ sudo chown -R noki:root
/var/spool/slurm-llnl/noki at noki-System-Product-Name:/var/spool/slurm-llnl$
ls -l
total 92
-rw------- 1 noki root 198 Jun 19 11:36 assoc_mgr_state
-rw------- 1 noki root 198 Jun 18 20:31 assoc_mgr_state.old
-rw------- 1 noki root  10 Jun 19 11:36 assoc_usage
-rw------- 1 noki root  10 Jun 18 20:31 assoc_usage.old
-rw-r--r-- 1 noki root   5 Jun 11 21:15 clustername
-rw------- 1 noki root  15 Jun 19 11:36 fed_mgr_state
-rw------- 1 noki root  15 Jun 18 20:31 fed_mgr_state.old
-rw------- 1 noki root  35 Jun 19 11:36 job_state
-rw------- 1 noki root  35 Jun 18 20:31 job_state.old
-rw------- 1 noki root  38 Jun 19 11:36 last_config_lite
-rw------- 1 noki root  38 Jun 19  2019 last_config_lite.old
-rw------- 1 noki root 109 Jun 19 11:36 layouts_state_base
-rw------- 1 noki root 109 Jun 18 20:31 layouts_state_base.old
-rw------- 1 noki root 194 Jun 19 11:36 node_state
-rw------- 1 noki root 194 Jun 18 20:31 node_state.old
-rw------- 1 noki root 142 Jun 19 11:36 part_state
-rw------- 1 noki root 142 Jun 18 20:31 part_state.old
-rw------- 1 noki root  10 Jun 19 11:36 qos_usage
-rw------- 1 noki root  10 Jun 18 20:31 qos_usage.old
-rw------- 1 noki root  35 Jun 19 11:36 resv_state
-rw------- 1 noki root  35 Jun 18 20:31 resv_state.old
-rw------- 1 noki root  31 Jun 19 11:36 trigger_state
-rw------- 1 noki root  31 Jun 18 20:31 trigger_state.old

After I restarted or not both slurmd and slrumctld, slurmctld is fine but
slurmd still shows the same issue.
The below is the owners and groups after restart both slurmd and slurmctld

noki at noki-System-Product-Name:~$ sudo chown -R noki:root /var/spool/slurm-llnl/
noki at noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l
total 92
-rw------- 1 noki noki 198 Jun 19 11:40 assoc_mgr_state
-rw------- 1 noki root 198 Jun 19 11:36 assoc_mgr_state.old
-rw------- 1 noki noki  10 Jun 19 11:40 assoc_usage
-rw------- 1 noki root  10 Jun 19 11:36 assoc_usage.old
-rw-r--r-- 1 noki root   5 Jun 11 21:15 clustername
-rw------- 1 noki noki  15 Jun 19 11:40 fed_mgr_state
-rw------- 1 noki root  15 Jun 19 11:36 fed_mgr_state.old
-rw------- 1 noki noki  35 Jun 19 11:40 job_state
-rw------- 1 noki root  35 Jun 19 11:36 job_state.old
-rw------- 1 noki noki  38 Jun 19 11:40 last_config_lite
-rw------- 1 noki root  38 Jun 19 11:36 last_config_lite.old
-rw------- 1 noki noki 109 Jun 19 11:40 layouts_state_base
-rw------- 1 noki root 109 Jun 19 11:36 layouts_state_base.old
-rw------- 1 noki noki 194 Jun 19 11:40 node_state
-rw------- 1 noki root 194 Jun 19 11:36 node_state.old
-rw------- 1 noki noki 142 Jun 19 11:40 part_state
-rw------- 1 noki root 142 Jun 19 11:36 part_state.old
-rw------- 1 noki noki  10 Jun 19 11:40 qos_usage
-rw------- 1 noki root  10 Jun 19 11:36 qos_usage.old
-rw------- 1 noki noki  35 Jun 19 11:40 resv_state
-rw------- 1 noki root  35 Jun 19 11:36 resv_state.old
-rw------- 1 noki noki  31 Jun 19 11:40 trigger_state
-rw------- 1 noki root  31 Jun 19 11:36 trigger_state.old

Do you think I need to change chmod?

Regards,

On Tue, Jun 18, 2019 at 9:27 PM mercan <ahmet.mercan at uhem.itu.edu.tr> wrote:

> Hi;
>
> I did not notice
>
> SlurmUser=noki
>
> line. The owner of the /var/run/slurm-llnl directory and the
> slurmctld.pid and slurmd.pid files should be "noki" user.
>
> chown -R noki:root /var/spool/slurm-llnl
>
> Regards;
>
> Ahmet M.
>
>
> On 18.06.2019 15:15, mercan wrote:
> > Hi;
> >
> > The owner of the /var/run/slurm-llnl directory and the slurmctld.pid
> > and slurmd.pid files should be "slurm" user. Your files owner are root
> > and noki.
> >
> > chown -R slurm:slurm /var/spool/slurm-llnl
> >
> >
> > Regards;
> >
> > Ahmet M.
> >
> >
> > On 18.06.2019 15:03, Noki Lee wrote:
> >>
> >> Though SLURM works fine for job submitting, running, and queueing, I
> >> got a minor error below.
> >>
> >> |sudo systemctl status slurmd|
> >>
> >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]: slurmd.service:
> >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?) after
> >> start: No such file or directory|
> >>
> >> |sudo systemctl status slurmctld|
> >>
> >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]: slurmd.service:
> >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?) after
> >> start: No such file or directory|
> >>
> >> I followed the installation of a guide from
> >>
> >>
> ftp://www.microway.com/pub/pub/for-customer/SDSU-Training/Webinar_2_Slurm_II--Ubuntu16.04_and_18.04.pdf
> >>
> >>
> >> This problem may come from the ownership of slurm.conf file?
> >>
> >> Here are my slurm.conf and ownership for slur*.pid
> >>
> >> |# slurm.conf file generated by configurator easy.html. # Put this
> >> file on all nodes of your cluster. # See the slurm.conf man page for
> >> more information. # ControlMachine=noki-System-Product-Name
> >> #ControlAddr= # #MailProg=/bin/mail MpiDefault=none
> >> #MpiParams=ports=#-# ProctrackType=proctrack/pgid ReturnToService=1
> >> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
> >> #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
> >> #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=noki
> >> #SlurmdUser=root StateSaveLocation=/var/spool/slurm-llnl
> >> SwitchType=switch/none TaskPlugin=task/none # # # TIMERS #KillWait=30
> >> #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # #
> >> SCHEDULING FastSchedule=1 SchedulerType=sched/backfill
> >> SelectType=select/linear #SelectTypeParameters= # # # LOGGING AND
> >> ACCOUNTING AccountingStorageType=accounting_storage/none
> >> ClusterName=linux #JobAcctGatherFrequency=30
> >> JobAcctGatherType=jobacct_gather/none #SlurmctldDebug=3
> >> SlurmctldLogFile=/var/log/slurm-llnl/SlurmctldLogFile #SlurmdDebug=3
> >> SlurmdLogFile=/var/log/slurm-llnl/SlurmdLogFile # # # COMPUTE NODES
> >> NodeName=noki-System-Product-Name CPUs=4 RealMemory=6963 Sockets=1
> >> CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN PartitionName=debug
> >> Nodes=noki-System-Product-Name Default=YES MaxTime=INFINITE State=UP |
> >> |$ ls -l /var/run/slurm-llnl/ total 8 -rw-r--r-- 1 noki root 6 Jun 12
> >> 10:20 slurmctld.pid -rw-r--r-- 1 root root 6 Jun 12 10:20 slurmd.pid|
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190619/447c9185/attachment.html>


More information about the slurm-users mailing list