[slurm-users] Slurmd not starting

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Wed Feb 13 14:09:48 UTC 2019


Hi Nathalie,

Which Slurm version and which OS version are you using?

FYI: My Slurm Wiki contains all the details of setting up Slurm on 
CentOS 7: https://wiki.fysik.dtu.dk/niflheim/SLURM

Best regards,
Ole

On 2/13/19 2:58 PM, Nathalie Gocht wrote:
> Hey,
> 
> I am building up a one node cluster. Master and node are n the same 
> machine. My slurm.conf:
> 
> ControlMachine=bayes
> 
> #
> 
> MpiDefault=none
> 
> ProctrackType=proctrack/pgid
> 
> ReturnToService=1
> 
> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
> 
> SlurmctldPort=6817
> 
> SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
> 
> SlurmdPort=6818
> 
> SlurmdSpoolDir=/var/spool/slurmd
> 
> SlurmUser=slurm
> 
> StateSaveLocation=/var/spool/slurmctld
> 
> SwitchType=switch/none
> 
> TaskPlugin=task/none
> 
> #
> 
> #
> 
> # TIMERS
> 
> InactiveLimit=0
> 
> KillWait=30
> 
> MinJobAge=300
> 
> SlurmctldTimeout=120
> 
> SlurmdTimeout=300
> 
> Waittime=0
> 
> #
> 
> #
> 
> # SCHEDULING
> 
> FastSchedule=1
> 
> SchedulerType=sched/builtin
> 
> SelectType=select/linear
> 
> #
> 
> #
> 
> # LOGGING AND ACCOUNTING
> 
> AccountingStorageLoc=/var/log/slurm-llnl/job_accounting
> 
> AccountingStorageType=accounting_storage/filetxt
> 
> AccountingStoreJobComment=YES
> 
> ClusterName=bayes
> 
> JobCompLoc=/var/log/slurm-llnl/job_completion
> 
> JobCompType=jobcomp/filetxt
> 
> JobAcctGatherFrequency=60
> 
> JobAcctGatherType=jobacct_gather/linux
> 
> SlurmctldDebug=info
> 
> SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
> 
> SlurmdDebug=info
> 
> SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
> 
> # COMPUTE NODES
> 
> GresTypes=gpu
> 
> NodeName=bayes Gres=gpu:tesla:1 CPUs=48 Sockets=2 CoresPerSocket=12 
> ThreadsPerCore=2 State=UNKNOWN
> 
> PartitionName=long Nodes=bayes Default=YES MaxTime=INFINITE State=UP
> 
> I started the control deamon, but get this information:
> 
> $ systemctl status slurmctld.service
> 
> ● slurmctld.service - Slurm controller daemon
> 
>     Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; 
> vendor preset: enabled)
> 
>     Active: failed (Result: exit-code) since Wed 2019-02-13 14:43:02 
> CET; 7min ago
> 
>       Docs: man:slurmctld(8)
> 
>    Process: 40552 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS 
> (code=exited, status=0/SUCCE
> 
> Main PID: 40560 (code=exited, status=1/FAILURE)
> 
> $ sinfo
> 
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> 
> long*        up   infinite      1   idle bayes
> 
> I tried to start the slurm deamon, but the timout exceeds. slurmd 
> -Dvvvgives:
> 
> slurmd: error: chmod(/var/spool/slurmd, 0755): Operation not permitted
> 
> slurmd: error: Unable to initialize slurmd spooldir
> 
> slurmd: error: slurmd initialization failed
> 
> Does someone know whats going on?



More information about the slurm-users mailing list