[slurm-users] How to fix “slurmd.service: Can't open PID file” error

mercan ahmet.mercan at uhem.itu.edu.tr
Wed Jun 19 06:37:30 UTC 2019


Hi;

Using the noki user, would you try to read 
/var/run/slurm-llnl/slurmd.pid and /var/run/slurm-llnl/slurmctld.pid 
files. Are there these files present, readable and writeable? May be 
upper directories don't have the permission to read/execute.

Regards;

Ahmet M.


On 19.06.2019 07:26, Noki Lee wrote:
> Hi, slurm-suers and Ahmet
>
> I already tried
>
> chown -R noki:root /var/run/slurm-llnl
>
> before I posted it.
>
> When I first saw these messages at a glance, I applied above command 
> and restarted demons.
> After that, with the same problems, I posted it.
>
> Regards,
>
> Noki.
>
> On Wed, Jun 19, 2019 at 12:24 PM mercan <ahmet.mercan at uhem.itu.edu.tr 
> <mailto:ahmet.mercan at uhem.itu.edu.tr>> wrote:
>
>     Hi;
>
>     Sorry, as you can see, I did a mistake again.  I wrote two different
>     directories:
>
>     "The owner of the /var/run/slurm-llnl directory and the
>     slurmctld.pid and slurmd.pid files should be "noki" user.
>
>     chown -R noki:root /var/spool/slurm-llnl"
>
>     You should run:
>
>     chown -R noki:root /var/run/slurm-llnl
>
>     Regards;
>
>     Ahmet M.
>
>
>     19.06.2019 05:55 tarihinde Noki Lee yazdı:
>     > Hi, slurm-users and mercan.
>     >
>     > I tried what you said.
>     > |noki at noki-System-Product-Name:~$ sudo chown -R noki:root
>     > /var/spool/slurm-llnl/
>     |noki at noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l
>     > total 92
>     > -rw------- 1 noki root 198 Jun 19 11:36 assoc_mgr_state
>     > -rw------- 1 noki root 198 Jun 18 20:31 assoc_mgr_state.old
>     > -rw------- 1 noki root  10 Jun 19 11:36 assoc_usage
>     > -rw------- 1 noki root  10 Jun 18 20:31 assoc_usage.old
>     > -rw-r--r-- 1 noki root   5 Jun 11 21:15 clustername
>     > -rw------- 1 noki root  15 Jun 19 11:36 fed_mgr_state
>     > -rw------- 1 noki root  15 Jun 18 20:31 fed_mgr_state.old
>     > -rw------- 1 noki root  35 Jun 19 11:36 job_state
>     > -rw------- 1 noki root  35 Jun 18 20:31 job_state.old
>     > -rw------- 1 noki root  38 Jun 19 11:36 last_config_lite
>     > -rw------- 1 noki root  38 Jun 19  2019 last_config_lite.old
>     > -rw------- 1 noki root 109 Jun 19 11:36 layouts_state_base
>     > -rw------- 1 noki root 109 Jun 18 20:31 layouts_state_base.old
>     > -rw------- 1 noki root 194 Jun 19 11:36 node_state
>     > -rw------- 1 noki root 194 Jun 18 20:31 node_state.old
>     > -rw------- 1 noki root 142 Jun 19 11:36 part_state
>     > -rw------- 1 noki root 142 Jun 18 20:31 part_state.old
>     > -rw------- 1 noki root  10 Jun 19 11:36 qos_usage
>     > -rw------- 1 noki root  10 Jun 18 20:31 qos_usage.old
>     > -rw------- 1 noki root  35 Jun 19 11:36 resv_state
>     > -rw------- 1 noki root  35 Jun 18 20:31 resv_state.old
>     > -rw------- 1 noki root  31 Jun 19 11:36 trigger_state
>     > -rw------- 1 noki root  31 Jun 18 20:31 trigger_state.old
>     > After I restarted or not both slurmd and slrumctld, slurmctld is
>     fine
>     > but slurmd still shows the same issue.
>     > The below is the owners and groups after restart both slurmd and
>     slurmctld
>     > |noki at noki-System-Product-Name:~$ sudo chown -R noki:root
>     > /var/spool/slurm-llnl/
>     > noki at noki-System-Product-Name:/var/spool/slurm-llnl$ ls -l total 92
>     > -rw------- 1 noki noki 198 Jun 19 11:40 assoc_mgr_state
>     -rw------- 1
>     > noki root 198 Jun 19 11:36 assoc_mgr_state.old -rw------- 1 noki
>     noki
>     >  10 Jun 19 11:40 assoc_usage -rw------- 1 noki root  10 Jun 19
>     11:36
>     > assoc_usage.old -rw-r--r-- 1 noki root   5 Jun 11 21:15 clustername
>     > -rw------- 1 noki noki  15 Jun 19 11:40 fed_mgr_state -rw------- 1
>     > noki root  15 Jun 19 11:36 fed_mgr_state.old -rw------- 1 noki noki
>     >  35 Jun 19 11:40 job_state -rw------- 1 noki root  35 Jun 19 11:36
>     > job_state.old -rw------- 1 noki noki  38 Jun 19 11:40
>     last_config_lite
>     > -rw------- 1 noki root  38 Jun 19 11:36 last_config_lite.old
>     > -rw------- 1 noki noki 109 Jun 19 11:40 layouts_state_base
>     -rw-------
>     > 1 noki root 109 Jun 19 11:36 layouts_state_base.old -rw------- 1
>     noki
>     > noki 194 Jun 19 11:40 node_state -rw------- 1 noki root 194 Jun 19
>     > 11:36 node_state.old -rw------- 1 noki noki 142 Jun 19 11:40
>     > part_state -rw------- 1 noki root 142 Jun 19 11:36 part_state.old
>     > -rw------- 1 noki noki  10 Jun 19 11:40 qos_usage -rw------- 1 noki
>     > root  10 Jun 19 11:36 qos_usage.old -rw------- 1 noki noki  35
>     Jun 19
>     > 11:40 resv_state -rw------- 1 noki root  35 Jun 19 11:36
>     > resv_state.old -rw------- 1 noki noki  31 Jun 19 11:40
>     trigger_state
>     > -rw------- 1 noki root  31 Jun 19 11:36 trigger_state.old |
>     > Do you think I need to change chmod?
>     >
>     > Regards,
>     >
>     > On Tue, Jun 18, 2019 at 9:27 PM mercan
>     <ahmet.mercan at uhem.itu.edu.tr <mailto:ahmet.mercan at uhem.itu.edu.tr>
>     > <mailto:ahmet.mercan at uhem.itu.edu.tr
>     <mailto:ahmet.mercan at uhem.itu.edu.tr>>> wrote:
>     >
>     >     Hi;
>     >
>     >     I did not notice
>     >
>     >     SlurmUser=noki
>     >
>     >     line. The owner of the /var/run/slurm-llnl directory and the
>     >     slurmctld.pid and slurmd.pid files should be "noki" user.
>     >
>     >     chown -R noki:root /var/spool/slurm-llnl
>     >
>     >     Regards;
>     >
>     >     Ahmet M.
>     >
>     >
>     >     On 18.06.2019 15:15, mercan wrote:
>     >     > Hi;
>     >     >
>     >     > The owner of the /var/run/slurm-llnl directory and the
>     >     slurmctld.pid
>     >     > and slurmd.pid files should be "slurm" user. Your files owner
>     >     are root
>     >     > and noki.
>     >     >
>     >     > chown -R slurm:slurm /var/spool/slurm-llnl
>     >     >
>     >     >
>     >     > Regards;
>     >     >
>     >     > Ahmet M.
>     >     >
>     >     >
>     >     > On 18.06.2019 15:03, Noki Lee wrote:
>     >     >>
>     >     >> Though SLURM works fine for job submitting, running, and
>     >     queueing, I
>     >     >> got a minor error below.
>     >     >>
>     >     >> |sudo systemctl status slurmd|
>     >     >>
>     >     >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]:
>     >     slurmd.service:
>     >     >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?)
>     after
>     >     >> start: No such file or directory|
>     >     >>
>     >     >> |sudo systemctl status slurmctld|
>     >     >>
>     >     >> |Jun 12 10:20:40 noki-System-Product-Name systemd[1]:
>     >     slurmd.service:
>     >     >> Can't open PID file /var/run/slurm-llnl/slurmd.pid (yet?)
>     after
>     >     >> start: No such file or directory|
>     >     >>
>     >     >> I followed the installation of a guide from
>     >     >>
>     >     >>
>     >
>     ftp://www.microway.com/pub/pub/for-customer/SDSU-Training/Webinar_2_Slurm_II--Ubuntu16.04_and_18.04.pdf
>     >
>     >     >>
>     >     >>
>     >     >> This problem may come from the ownership of slurm.conf file?
>     >     >>
>     >     >> Here are my slurm.conf and ownership for slur*.pid
>     >     >>
>     >     >> |# slurm.conf file generated by configurator easy.html. #
>     Put this
>     >     >> file on all nodes of your cluster. # See the slurm.conf man
>     >     page for
>     >     >> more information. # ControlMachine=noki-System-Product-Name
>     >     >> #ControlAddr= # #MailProg=/bin/mail MpiDefault=none
>     >     >> #MpiParams=ports=#-# ProctrackType=proctrack/pgid
>     >     ReturnToService=1
>     >     >> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
>     >     >> #SlurmctldPort=6817
>     SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
>     >     >> #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd
>     SlurmUser=noki
>     >     >> #SlurmdUser=root StateSaveLocation=/var/spool/slurm-llnl
>     >     >> SwitchType=switch/none TaskPlugin=task/none # # # TIMERS
>     >     #KillWait=30
>     >     >> #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # #
>     >     >> SCHEDULING FastSchedule=1 SchedulerType=sched/backfill
>     >     >> SelectType=select/linear #SelectTypeParameters= # # #
>     LOGGING AND
>     >     >> ACCOUNTING AccountingStorageType=accounting_storage/none
>     >     >> ClusterName=linux #JobAcctGatherFrequency=30
>     >     >> JobAcctGatherType=jobacct_gather/none #SlurmctldDebug=3
>     >     >> SlurmctldLogFile=/var/log/slurm-llnl/SlurmctldLogFile
>     >     #SlurmdDebug=3
>     >     >> SlurmdLogFile=/var/log/slurm-llnl/SlurmdLogFile # # # COMPUTE
>     >     NODES
>     >     >> NodeName=noki-System-Product-Name CPUs=4 RealMemory=6963
>     Sockets=1
>     >     >> CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
>     >     PartitionName=debug
>     >     >> Nodes=noki-System-Product-Name Default=YES MaxTime=INFINITE
>     >     State=UP |
>     >     >> |$ ls -l /var/run/slurm-llnl/ total 8 -rw-r--r-- 1 noki
>     root 6
>     >     Jun 12
>     >     >> 10:20 slurmctld.pid -rw-r--r-- 1 root root 6 Jun 12 10:20
>     >     slurmd.pid|
>     >     >>
>     >     >
>     >
>



More information about the slurm-users mailing list