[slurm-users] How to fix “slurmd.service: Can't open PID file” error
mercan
ahmet.mercan at uhem.itu.edu.tr
Wed Jun 19 03:23:56 UTC 2019
Hi;
Sorry, as you can see, I did a mistake again. I wrote two different
directories:
"The owner of the /var/run/slurm-llnl directory and the
slurmctld.pid and slurmd.pid files should be "noki" user.
chown -R noki:root /var/spool/slurm-llnl"
You should run:
chown -R noki:root /var/run/slurm-llnl
Regards;
Ahmet M.
19.06.2019 05:55 tarihinde Noki Lee yazdı:
> Hi, slurm-users and mercan.
>
> I tried what you said.
> |noki at noki-System-Product-Name:~$ sudo chown -R noki:root
> /var/spool/slurm-llnl/ |noki at noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l
> total 92
> -rw------- 1 noki root 198 Jun 19 11:36 assoc_mgr_state
> -rw------- 1 noki root 198 Jun 18 20:31 assoc_mgr_state.old
> -rw------- 1 noki root 10 Jun 19 11:36 assoc_usage
> -rw------- 1 noki root 10 Jun 18 20:31 assoc_usage.old
> -rw-r--r-- 1 noki root 5 Jun 11 21:15 clustername
> -rw------- 1 noki root 15 Jun 19 11:36 fed_mgr_state
> -rw------- 1 noki root 15 Jun 18 20:31 fed_mgr_state.old
> -rw------- 1 noki root 35 Jun 19 11:36 job_state
> -rw------- 1 noki root 35 Jun 18 20:31 job_state.old
> -rw------- 1 noki root 38 Jun 19 11:36 last_config_lite
> -rw------- 1 noki root 38 Jun 19 2019 last_config_lite.old
> -rw------- 1 noki root 109 Jun 19 11:36 layouts_state_base
> -rw------- 1 noki root 109 Jun 18 20:31 layouts_state_base.old
> -rw------- 1 noki root 194 Jun 19 11:36 node_state
> -rw------- 1 noki root 194 Jun 18 20:31 node_state.old
> -rw------- 1 noki root 142 Jun 19 11:36 part_state
> -rw------- 1 noki root 142 Jun 18 20:31 part_state.old
> -rw------- 1 noki root 10 Jun 19 11:36 qos_usage
> -rw------- 1 noki root 10 Jun 18 20:31 qos_usage.old
> -rw------- 1 noki root 35 Jun 19 11:36 resv_state
> -rw------- 1 noki root 35 Jun 18 20:31 resv_state.old
> -rw------- 1 noki root 31 Jun 19 11:36 trigger_state
> -rw------- 1 noki root 31 Jun 18 20:31 trigger_state.old
> After I restarted or not both slurmd and slrumctld, slurmctld is fine
> but slurmd still shows the same issue.
> The below is the owners and groups after restart both slurmd and slurmctld
> |noki at noki-System-Product-Name:~$ sudo chown -R noki:root
> /var/spool/slurm-llnl/
> noki at noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l total 92
> -rw------- 1 noki noki 198 Jun 19 11:40 assoc_mgr_state -rw------- 1
> noki root 198 Jun 19 11:36 assoc_mgr_state.old -rw------- 1 noki noki
> 10 Jun 19 11:40 assoc_usage -rw------- 1 noki root 10 Jun 19 11:36
> assoc_usage.old -rw-r--r-- 1 noki root 5 Jun 11 21:15 clustername
> -rw------- 1 noki noki 15 Jun 19 11:40 fed_mgr_state -rw------- 1
> noki root 15 Jun 19 11:36 fed_mgr_state.old -rw------- 1 noki noki
> 35 Jun 19 11:40 job_state -rw------- 1 noki root 35 Jun 19 11:36
> job_state.old -rw------- 1 noki noki 38 Jun 19 11:40 last_config_lite
> -rw------- 1 noki root 38 Jun 19 11:36 last_config_lite.old
> -rw------- 1 noki noki 109 Jun 19 11:40 layouts_state_base -rw-------
> 1 noki root 109 Jun 19 11:36 layouts_state_base.old -rw------- 1 noki
> noki 194 Jun 19 11:40 node_state -rw------- 1 noki root 194 Jun 19
> 11:36 node_state.old -rw------- 1 noki noki 142 Jun 19 11:40
> part_state -rw------- 1 noki root 142 Jun 19 11:36 part_state.old
> -rw------- 1 noki noki 10 Jun 19 11:40 qos_usage -rw------- 1 noki
> root 10 Jun 19 11:36 qos_usage.old -rw------- 1 noki noki 35 Jun 19
> 11:40 resv_state -rw------- 1 noki root 35 Jun 19 11:36
> resv_state.old -rw------- 1 noki noki 31 Jun 19 11:40 trigger_state
> -rw------- 1 noki root 31 Jun 19 11:36 trigger_state.old |
> Do you think I need to change chmod?
>
> Regards,
>
> On Tue, Jun 18, 2019 at 9:27 PM mercan <ahmet.mercan at uhem.itu.edu.tr
> <mailto:ahmet.mercan at uhem.itu.edu.tr>> wrote:
>
> Hi;
>
> I did not notice
>
> SlurmUser=noki
>
> line. The owner of the /var/run/slurm-llnl directory and the
> slurmctld.pid and slurmd.pid files should be "noki" user.
>
> chown -R noki:root /var/spool/slurm-llnl
>
> Regards;
>
> Ahmet M.
>
>
> On 18.06.2019 15:15, mercan wrote:
> > Hi;
> >
> > The owner of the /var/run/slurm-llnl directory and the
> slurmctld.pid
> > and slurmd.pid files should be "slurm" user. Your files owner
> are root
> > and noki.
> >
> > chown -R slurm:slurm /var/spool/slurm-llnl
> >
> >
> > Regards;
> >
> > Ahmet M.
> >
> >
> > On 18.06.2019 15:03, Noki Lee wrote:
> >>
> >> Though SLURM works fine for job submitting, running, and
> queueing, I
> >> got a minor error below.
> >>
> >> |sudo systemctl status slurmd|
> >>
> >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]:
> slurmd.service:
> >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?) after
> >> start: No such file or directory|
> >>
> >> |sudo systemctl status slurmctld|
> >>
> >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]:
> slurmd.service:
> >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?) after
> >> start: No such file or directory|
> >>
> >> I followed the installation of a guide from
> >>
> >>
> ftp://www.microway.com/pub/pub/for-customer/SDSU-Training/Webinar_2_Slurm_II--Ubuntu16.04_and_18.04.pdf
>
> >>
> >>
> >> This problem may come from the ownership of slurm.conf file?
> >>
> >> Here are my slurm.conf and ownership for slur*.pid
> >>
> >> |# slurm.conf file generated by configurator easy.html. # Put this
> >> file on all nodes of your cluster. # See the slurm.conf man
> page for
> >> more information. # ControlMachine=noki-System-Product-Name
> >> #ControlAddr= # #MailProg=/bin/mail MpiDefault=none
> >> #MpiParams=ports=#-# ProctrackType=proctrack/pgid
> ReturnToService=1
> >> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
> >> #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
> >> #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=noki
> >> #SlurmdUser=root StateSaveLocation=/var/spool/slurm-llnl
> >> SwitchType=switch/none TaskPlugin=task/none # # # TIMERS
> #KillWait=30
> >> #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # #
> >> SCHEDULING FastSchedule=1 SchedulerType=sched/backfill
> >> SelectType=select/linear #SelectTypeParameters= # # # LOGGING AND
> >> ACCOUNTING AccountingStorageType=accounting_storage/none
> >> ClusterName=linux #JobAcctGatherFrequency=30
> >> JobAcctGatherType=jobacct_gather/none #SlurmctldDebug=3
> >> SlurmctldLogFile=/var/log/slurm-llnl/SlurmctldLogFile
> #SlurmdDebug=3
> >> SlurmdLogFile=/var/log/slurm-llnl/SlurmdLogFile # # # COMPUTE
> NODES
> >> NodeName=noki-System-Product-Name CPUs=4 RealMemory=6963 Sockets=1
> >> CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
> PartitionName=debug
> >> Nodes=noki-System-Product-Name Default=YES MaxTime=INFINITE
> State=UP |
> >> |$ ls -l /var/run/slurm-llnl/ total 8 -rw-r--r-- 1 noki root 6
> Jun 12
> >> 10:20 slurmctld.pid -rw-r--r-- 1 root root 6 Jun 12 10:20
> slurmd.pid|
> >>
> >
>
More information about the slurm-users
mailing list