<div dir="ltr">In the headnode. (I'm also noticing, and seems good to tell, for maybe the problem is the same, even ldap is not working as expected giving a message "invalid credential (49)" which is a message given when there are problem of this type. The update to jessie must have touched something that is affecting all my software sanity :D )<div><br><div>Here is the my slurm.conf.<div><div><br></div><div><div># slurm.conf file generated by configurator.html.</div><div># Put this file on all nodes of your cluster.</div><div># See the slurm.conf man page for more information.</div><div>#</div><div>ControlMachine=anyone</div><div>ControlAddr=master</div><div>#BackupController=</div><div>#BackupAddr=</div><div>#</div><div>AuthType=auth/munge</div><div>CacheGroups=0</div><div>#CheckpointType=checkpoint/none</div><div>CryptoType=crypto/munge</div><div>#DisableRootJobs=NO</div><div>#EnforcePartLimits=NO</div><div>#Epilog=</div><div>#EpilogSlurmctld=</div><div>#FirstJobId=1</div><div>#MaxJobId=999999</div><div>#GresTypes=</div><div>#GroupUpdateForce=0</div><div>#GroupUpdateTime=600</div><div>#JobCheckpointDir=/var/slurm/checkpoint</div><div>#JobCredentialPrivateKey=</div><div>#JobCredentialPublicCertificate=</div><div>#JobFileAppend=0</div><div>#JobRequeue=1</div><div>#JobSubmitPlugins=1</div><div>#KillOnBadExit=0</div><div>#Licenses=foo*4,bar</div><div>#MailProg=/bin/mail</div><div>#MaxJobCount=5000</div><div>#MaxStepCount=40000</div><div>#MaxTasksPerNode=128</div><div>MpiDefault=openmpi</div><div>MpiParams=ports=12000-12999</div><div>#PluginDir=</div><div>#PlugStackConfig=</div><div>#PrivateData=jobs</div><div>ProctrackType=proctrack/cgroup</div><div>#Prolog=</div><div>#PrologSlurmctld=</div><div>#PropagatePrioProcess=0</div><div>#PropagateResourceLimits=</div><div>#PropagateResourceLimitsExcept=</div><div>ReturnToService=2</div><div>#SallocDefaultCommand=</div><div>SlurmctldPidFile=/var/run/slurmctld.pid</div><div>SlurmctldPort=6817</div><div>SlurmdPidFile=/var/run/slurmd.pid</div><div>SlurmdPort=6818</div><div>SlurmdSpoolDir=/tmp/slurmd</div><div>SlurmUser=slurm</div><div>#SlurmdUser=root</div><div>#SrunEpilog=</div><div>#SrunProlog=</div><div>StateSaveLocation=/tmp</div><div>SwitchType=switch/none</div><div>#TaskEpilog=</div><div>TaskPlugin=task/cgroup</div><div>#TaskPluginParam=</div><div>#TaskProlog=</div><div>#TopologyPlugin=topology/tree</div><div>#TmpFs=/tmp</div><div>#TrackWCKey=no</div><div>#TreeWidth=</div><div>#UnkillableStepProgram=</div><div>#UsePAM=0</div><div>#</div><div>#</div><div># TIMERS</div><div>#BatchStartTimeout=10</div><div>#CompleteWait=0</div><div>#EpilogMsgTime=2000</div><div>#GetEnvTimeout=2</div><div>#HealthCheckInterval=0</div><div>#HealthCheckProgram=</div><div>InactiveLimit=0</div><div>KillWait=60</div><div>#MessageTimeout=10</div><div>#ResvOverRun=0</div><div>MinJobAge=43200</div><div>#OverTimeLimit=0</div><div>SlurmctldTimeout=600</div><div>SlurmdTimeout=600</div><div>#UnkillableStepTimeout=60</div><div>#VSizeFactor=0</div><div>Waittime=0</div><div>#</div><div>#</div><div># SCHEDULING</div><div>DefMemPerCPU=1000</div><div>FastSchedule=1</div><div>#MaxMemPerCPU=0</div><div>#SchedulerRootFilter=1</div><div>#SchedulerTimeSlice=30</div><div>SchedulerType=sched/backfill</div><div>#SchedulerPort=</div><div>SelectType=select/cons_res</div><div>SelectTypeParameters=CR_CPU_Memory</div><div>#</div><div>#</div><div># JOB PRIORITY</div><div>#PriorityType=priority/basic</div><div>#PriorityDecayHalfLife=</div><div>#PriorityCalcPeriod=</div><div>#PriorityFavorSmall=</div><div>#PriorityMaxAge=</div><div>#PriorityUsageResetPeriod=</div><div>#PriorityWeightAge=</div><div>#PriorityWeightFairshare=</div><div>#PriorityWeightJobSize=</div><div>#PriorityWeightPartition=</div><div>#PriorityWeightQOS=</div><div>#</div><div>#</div><div># LOGGING AND ACCOUNTING</div><div>#AccountingStorageEnforce=0</div><div>#AccountingStorageHost=</div><div>AccountingStorageLoc=/var/log/slurm-llnl/AccountingStorage.log</div><div>#AccountingStoragePass=</div><div>#AccountingStoragePort=</div><div>AccountingStorageType=accounting_storage/filetxt</div><div>#AccountingStorageUser=</div><div>AccountingStoreJobComment=YES</div><div>ClusterName=cluster</div><div>#DebugFlags=</div><div>#JobCompHost=</div><div>JobCompLoc=/var/log/slurm-llnl/JobComp.log</div><div>#JobCompPass=</div><div>#JobCompPort=</div><div>JobCompType=jobcomp/filetxt</div><div>#JobCompUser=</div><div>JobAcctGatherFrequency=60</div><div>JobAcctGatherType=jobacct_gather/linux</div><div>SlurmctldDebug=3</div><div>#SlurmctldLogFile=</div><div>SlurmdDebug=3</div><div>#SlurmdLogFile=</div><div>#SlurmSchedLogFile=</div><div>#SlurmSchedLogLevel=</div><div>#</div><div>#</div><div># POWER SAVE SUPPORT FOR IDLE NODES (optional)</div><div>#SuspendProgram=</div><div>#ResumeProgram=</div><div>#SuspendTimeout=</div><div>#ResumeTimeout=</div><div>#ResumeRate=</div><div>#SuspendExcNodes=</div><div>#SuspendExcParts=</div><div>#SuspendRate=</div><div>#SuspendTime=</div><div>#</div><div>#</div><div># COMPUTE NODES</div><div>NodeName=node[01-08] CPUs=16 RealMemory=16000 State=UNKNOWN</div><div>PartitionName=batch Nodes=node[01-08] Default=YES MaxTime=INFINITE State=UP</div></div><div><br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-01-15 16:43 GMT+01:00 Carlos Fenoy <span dir="ltr"><<a href="mailto:minibit@gmail.com" target="_blank">minibit@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Are you trying to start the slurmd in the headnode or a compute node?<div><br></div><div>Can you provide the slurm.conf file?</div><div><br></div><div>Regards,</div><div>Carlos</div></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">On Mon, Jan 15, 2018 at 4:30 PM, Elisabetta Falivene <span dir="ltr"><<a href="mailto:e.falivene@ilabroma.com" target="_blank">e.falivene@ilabroma.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>slurmd -Dvvv says</div><div><br></div><div>slurmd: fatal: Unable to determine this slurmd's NodeName</div><div><br></div><div>b</div><div><div class="m_-9093742716976473048h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-01-15 15:58 GMT+01:00 Douglas Jacobsen <span dir="ltr"><<a href="mailto:dmjacobsen@lbl.gov" target="_blank">dmjacobsen@lbl.gov</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">The fact that sinfo is responding shows that at least slurmctld is running. Slumd, on the other hand is not. Please also get output of slurmd log or running "slurmd -Dvvv"</div></blockquote><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="m_-9093742716976473048m_5348135526455917116gmail-HOEnZb"><div class="m_-9093742716976473048m_5348135526455917116gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Jan 15, 2018 06:42, "Elisabetta Falivene" <<a href="mailto:e.falivene@ilabroma.com" target="_blank">e.falivene@ilabroma.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span style="font-size:12.8px">> Anyway I suggest to update the operating system to stretch and fix your</span><br style="font-size:12.8px"><span style="font-size:12.8px">> configuration under a more recent version of slurm.</span><br><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">I think I'll soon arrive to that :)</span></div><div><span style="font-size:12.8px">b</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-01-15 14:08 GMT+01:00 Gennaro Oliva <span dir="ltr"><<a href="mailto:oliva.g@na.icar.cnr.it" target="_blank">oliva.g@na.icar.cnr.it</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Ciao Elisabetta,<br>
<span><br>
On Mon, Jan 15, 2018 at 01:13:27PM +0100, Elisabetta Falivene wrote:<br>
> Error messages are not much helping me in guessing what is going on. What<br>
> should I check to get what is failing?<br>
<br>
</span>check slurmctld.log and slurmd.log, you can find them under<br>
/var/log/slurm-llnl<br>
<br>
> *PARTITION AVAIL TIMELIMIT NODES STATE NODELIST*<br>
> *batch* up infinite 8 unk* node[01-08]*<br>
><br>
><br>
> Running<br>
> *systemctl status slurmctld.service*<br>
><br>
> returns<br>
><br>
> *slurmctld.service - Slurm controller daemon*<br>
> * Loaded: loaded (/lib/systemd/system/slurmctld<wbr>.service; enabled)*<br>
> * Active: failed (Result: timeout) since Mon 2018-01-15 13:03:39 CET; 41s<br>
> ago*<br>
> * Process: 2098 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS<br>
> (code=exited, status=0/SUCCESS)*<br>
><br>
> * slurmctld[2100]: cons_res: select_p_reconfigure*<br>
> * slurmctld[2100]: cons_res: select_p_node_init*<br>
> * slurmctld[2100]: cons_res: preparing for 1 partitions*<br>
> * slurmctld[2100]: Running as primary controller*<br>
> * slurmctld[2100]:<br>
> SchedulerParameters=default_qu<wbr>eue_depth=100,max_rpc_cnt=0,ma<wbr>x_sched_time=4,partition_job_d<wbr>epth=0*<br>
> * slurmctld.service start operation timed out. Terminating.*<br>
> *Terminate signal (SIGINT or SIGTERM) received*<br>
> * slurmctld[2100]: Saving all slurm state*<br>
> * Failed to start Slurm controller daemon.*<br>
> * Unit slurmctld.service entered failed state.*<br>
<br>
Do you have a backup controller?<br>
Check your slurm.conf under:<br>
/etc/slurm-llnl<br>
<br>
Anyway I suggest to update the operating system to stretch and fix your<br>
configuration under a more recent version of slurm.<br>
Best regards<br>
<span class="m_-9093742716976473048m_5348135526455917116gmail-m_-4769155202419504202m_1324718862540659117HOEnZb"><font color="#888888">--<br>
Gennaro Oliva<br>
<br>
</font></span></blockquote></div><br></div>
</blockquote></div></div>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><div class="m_-9093742716976473048gmail_signature" data-smartmail="gmail_signature">--<br>Carles Fenoy<br></div>
</font></span></div>
</blockquote></div><br></div>