[slurm-users] Slurmd not starting
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Wed Feb 13 14:09:48 UTC 2019
Hi Nathalie,
Which Slurm version and which OS version are you using?
FYI: My Slurm Wiki contains all the details of setting up Slurm on
CentOS 7: https://wiki.fysik.dtu.dk/niflheim/SLURM
Best regards,
Ole
On 2/13/19 2:58 PM, Nathalie Gocht wrote:
> Hey,
>
> I am building up a one node cluster. Master and node are n the same
> machine. My slurm.conf:
>
> ControlMachine=bayes
>
> #
>
> MpiDefault=none
>
> ProctrackType=proctrack/pgid
>
> ReturnToService=1
>
> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
>
> SlurmctldPort=6817
>
> SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
>
> SlurmdPort=6818
>
> SlurmdSpoolDir=/var/spool/slurmd
>
> SlurmUser=slurm
>
> StateSaveLocation=/var/spool/slurmctld
>
> SwitchType=switch/none
>
> TaskPlugin=task/none
>
> #
>
> #
>
> # TIMERS
>
> InactiveLimit=0
>
> KillWait=30
>
> MinJobAge=300
>
> SlurmctldTimeout=120
>
> SlurmdTimeout=300
>
> Waittime=0
>
> #
>
> #
>
> # SCHEDULING
>
> FastSchedule=1
>
> SchedulerType=sched/builtin
>
> SelectType=select/linear
>
> #
>
> #
>
> # LOGGING AND ACCOUNTING
>
> AccountingStorageLoc=/var/log/slurm-llnl/job_accounting
>
> AccountingStorageType=accounting_storage/filetxt
>
> AccountingStoreJobComment=YES
>
> ClusterName=bayes
>
> JobCompLoc=/var/log/slurm-llnl/job_completion
>
> JobCompType=jobcomp/filetxt
>
> JobAcctGatherFrequency=60
>
> JobAcctGatherType=jobacct_gather/linux
>
> SlurmctldDebug=info
>
> SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
>
> SlurmdDebug=info
>
> SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
>
> # COMPUTE NODES
>
> GresTypes=gpu
>
> NodeName=bayes Gres=gpu:tesla:1 CPUs=48 Sockets=2 CoresPerSocket=12
> ThreadsPerCore=2 State=UNKNOWN
>
> PartitionName=long Nodes=bayes Default=YES MaxTime=INFINITE State=UP
>
> I started the control deamon, but get this information:
>
> $ systemctl status slurmctld.service
>
> ● slurmctld.service - Slurm controller daemon
>
> Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled;
> vendor preset: enabled)
>
> Active: failed (Result: exit-code) since Wed 2019-02-13 14:43:02
> CET; 7min ago
>
> Docs: man:slurmctld(8)
>
> Process: 40552 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
> (code=exited, status=0/SUCCE
>
> Main PID: 40560 (code=exited, status=1/FAILURE)
>
> $ sinfo
>
> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
>
> long* up infinite 1 idle bayes
>
> I tried to start the slurm deamon, but the timout exceeds. slurmd
> -Dvvvgives:
>
> slurmd: error: chmod(/var/spool/slurmd, 0755): Operation not permitted
>
> slurmd: error: Unable to initialize slurmd spooldir
>
> slurmd: error: slurmd initialization failed
>
> Does someone know whats going on?
More information about the slurm-users
mailing list