[slurm-users] Slurmd not starting

Nathalie Gocht nathalie.gocht at outlook.com
Wed Feb 13 13:58:51 UTC 2019


Hey,

I am building up a one node cluster. Master and node are n the same machine. My slurm.conf:

ControlMachine=bayes
#
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/builtin
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageLoc=/var/log/slurm-llnl/job_accounting
AccountingStorageType=accounting_storage/filetxt
AccountingStoreJobComment=YES
ClusterName=bayes
JobCompLoc=/var/log/slurm-llnl/job_completion
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=60
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log

# COMPUTE NODES
GresTypes=gpu

NodeName=bayes Gres=gpu:tesla:1 CPUs=48 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWN
PartitionName=long Nodes=bayes Default=YES MaxTime=INFINITE State=UP


I started the control deamon, but get this information:
$ systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2019-02-13 14:43:02 CET; 7min ago
     Docs: man:slurmctld(8)
  Process: 40552 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCE
Main PID: 40560 (code=exited, status=1/FAILURE)

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
long*        up   infinite      1   idle bayes

I tried to start the slurm deamon, but the timout exceeds. slurmd -Dvvv gives:

slurmd: error: chmod(/var/spool/slurmd, 0755): Operation not permitted
slurmd: error: Unable to initialize slurmd spooldir
slurmd: error: slurmd initialization failed

Does someone know whats going on?

Thanks,
Nathalie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190213/54c04c67/attachment.html>


More information about the slurm-users mailing list