<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="FR" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hello David,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">slurmd daemon is not running (while slurmctld and slurmdbd are).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">slurmd.log (different from slurmctld.log) should contain more information.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Pierre-Marie Le Biot<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif"> slurm-users [mailto:slurm-users-bounces@lists.schedmd.com]
<b>On Behalf Of </b>david vilanova<br>
<b>Sent:</b> Thursday, November 30, 2017 9:32 AM<br>
<b>To:</b> Slurm User Community List <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> Re: [slurm-users] slurm conf with single machine with multi cores.<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Sorry for the delay, was trying to fix it but still not working.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">The node is always down. The master machine is also the compute machine. It's a single server that i use for that. 1 node and 12 cpus.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">In the log below i see this line<o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:41.764] agent/is_node_resp: node:linuxcluster RPC:REQUEST_NODE_REGISTRATION_STATUS : Communication connection failure<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Here below my slurm.conf file:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">ControlMachine=linuxcluster<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">AuthType=auth/munge<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">CryptoType=crypto/munge<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">MailProg=/usr/bin/mail<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">MpiDefault=none<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">PluginDir=/usr/local/lib/slurm<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">ProctrackType=proctrack/cgroup<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">ReturnToService=1<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmctldPidFile=/var/run/slurmctld.pid<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmctldPort=6817<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmdPidFile=/var/run/slurmd.pid<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmdPort=6818<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmdSpoolDir=/var/spool/slurm/d<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmUser=slurm<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">StateSaveLocation=/var/spool/slurm/ctld<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SwitchType=switch/none<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">TaskPlugin=task/none<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">InactiveLimit=0<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">KillWait=30<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">MinJobAge=300<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmctldTimeout=120<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmdTimeout=300<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Waittime=0<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">FastSchedule=1<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SchedulerType=sched/backfill<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">AccountingStorageHost=linuxcluster<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">AccountingStorageType=accounting_storage/slurmdbd<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">AccountingStorageUser=slurm<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">AccountingStoreJobComment=YES<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">ClusterName=linuxcluster<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">JobCompType=jobcomp/none<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">JobCompUser=slurm<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">JobAcctGatherFrequency=30<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">JobAcctGatherType=jobacct_gather/cgroup<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmctldDebug=5<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmctldLogFile=/var/log/slurm/slurmctrl.log<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SlurmdDebug=5<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">SelectType=select/cons_res<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">SelectTypeParameters=CR_CPU<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">NodeName=linuxcluster CPUs=12<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">PartitionName=testq Nodes=linuxclusterDefault=YES MaxTime=INFINITE State=UP<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">slurmctrld.log:<o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.025] debug: Log file re-opened<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.025] debug: sched: slurmctld starting<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.025] slurmctld version 17.11.0 started on cluster linuxcluster<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] Munge cryptographic signature plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] Consumable Resources (CR) Node Selection plugin loaded with argument 1<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] preempt/none loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: Checkpoint plugin loaded: checkpoint/none<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: AcctGatherEnergy NONE plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: AcctGatherProfile NONE plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: AcctGatherInterconnect NONE plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: AcctGatherFilesystem NONE plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: Job accounting gather cgroup plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] ExtSensors NONE plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: switch NONE plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: power_save module disabled, SuspendTime < 0<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] debug: No backup controller to shutdown<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.026] Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.027] debug: Munge authentication plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.030] debug: slurmdbd: Sent PersistInit msg<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.030] slurmdbd: recovered 0 pending RPCs<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.429] debug: Reading slurm.conf file: /usr/local/etc/slurm.conf<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.430] layouts: no layout to initialize<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.430] topology NONE plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.430] debug: No DownNodes<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] debug: Log file re-opened<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] sched: Backfill scheduler plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] route default plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] layouts: loading entities/relations information<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] debug: layouts: 1/1 nodes in hash table, rc=0<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] debug: layouts: loading stage 1<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] debug: layouts: loading stage 1.1 (restore state)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] debug: layouts: loading stage 2<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] debug: layouts: loading stage 3<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] Recovered state of 1 nodes<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] Down nodes: linuxcluster<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] Recovered JobID=15 State=0x4 NodeCnt=0 Assoc=6<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] Recovered information about 1 jobs<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.435] cons_res: select_p_node_init<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] cons_res: preparing for 1 partitions<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] debug: Updating partition uid access list<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] Recovered state of 0 reservations<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] State of 0 triggers recovered<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] _preserve_plugins: backup_controller not specified<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] cons_res: select_p_reconfigure<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] cons_res: select_p_node_init<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] cons_res: preparing for 1 partitions<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] Running as primary controller<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] debug: No BackupController, not launching heartbeat.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.436] Registering slurmctld at port 6817 with slurmdbd.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.677] debug: No feds to retrieve from state<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.757] debug: Priority BASIC plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.758] No parameter for mcs plugin, default values set<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.758] mcs: MCSParameters = (null). ondemand set.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.758] debug: mcs none plugin loaded<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:28.758] debug: power_save mode not enabled<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:31.761] debug: Spawning registration agent for linuxcluster1 hosts<o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:41.764] agent/is_node_resp: node:linuxcluster RPC:REQUEST_NODE_REGISTRATION_STATUS : Communication connection failure<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:58.435] debug: backfill: beginning<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:24:58.435] debug: backfill: no jobs to backfill<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:25:28.435] debug: backfill: beginning<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:25:28.436] debug: backfill: no jobs to backfill<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:25:28.830] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_sta<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">rt=0,sched_min_interval=2<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:25:28.830] debug: sched: Running job scheduler<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:25:58.436] debug: backfill: beginning<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">[2017-11-30T09:25:58.436] debug: backfill: no jobs to backfill<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal">ps -ef | grep slurm<o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">ubuntu@linuxcluster:/home/dvi/$ ps -ef | grep slurm<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">slurm 11388 1 0 09:24 ? 00:00:00 /usr/local/sbin/slurmdbd<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">slurm 11430 1 0 09:24 ? 00:00:00 /usr/local/sbin/slurmctld<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal">Any idea ?<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">El El mié, 29 nov 2017 a las 18:21, Le Biot, Pierre-Marie <<a href="mailto:pierre-marie.lebiot@hpe.com">pierre-marie.lebiot@hpe.com</a>> escribió:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Hello David,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">So linuxcluster is the Head node and also a Compute node ?</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Is slurmd running ?</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">What does /var/log/slurm/slurmd.log say ?</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Regards,</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Pierre-Marie Le Biot</span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> </span><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif"> slurm-users
[mailto:<a href="mailto:slurm-users-bounces@lists.schedmd.com" target="_blank">slurm-users-bounces@lists.schedmd.com</a>]
<b>On Behalf Of </b>david vilanova<br>
<b>Sent:</b> Wednesday, November 29, 2017 4:33 PM<br>
<b>To:</b> Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> Re: [slurm-users] slurm conf with single machine with multi cores.</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">Hi,</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">I have updated the slurm.conf as follows:</span><o:p></o:p></p>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131"><br clear="all" style="word-spacing:1px">
</span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">SelectType=select/cons_res</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">SelectTypeParameters=CR_CPU</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">NodeName=linuxcluster CPUs=2</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">PartitionName=testq Nodes=linuxcluster Default=YES MaxTime=INFINITE State=UP</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">Still get testq node in down status ??? Any idea ?</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">Below log from db and controller:</span><o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">==> /var/log/slurm/slurmctrl.log <==</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.446] slurmctld version 17.11.0 started on cluster linuxcluster</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.850] error: SelectType specified more than once, latest value used</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.851] layouts: no layout to initialize</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.855] layouts: loading entities/relations information</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.855] Recovered state of 1 nodes</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.855] Down nodes: linuxcluster</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.855] Recovered information about 0 jobs</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.855] cons_res: select_p_node_init</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.855] cons_res: preparing for 1 partitions</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.856] Recovered state of 0 reservations</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.856] _preserve_plugins: backup_controller not specified</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.856] cons_res: select_p_reconfigure</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.856] cons_res: select_p_node_init</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.856] cons_res: preparing for 1 partitions</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.856] Running as primary controller</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:30.856] Registering slurmctld at port 6817 with slurmdbd.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:31.098] No parameter for mcs plugin, default values set</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:28:31.098] mcs: MCSParameters = (null). ondemand set.</span><o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">[2017-11-29T16:29:31.169] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131">David</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span style="color:#313131"> </span><o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">El El mié, 29 nov 2017 a las 15:59, Steffen Grunewald <<a href="mailto:steffen.grunewald@aei.mpg.de" target="_blank">steffen.grunewald@aei.mpg.de</a>> escribió:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt">Hi David,<br>
<br>
On Wed, 2017-11-29 at 14:45:06 +0000, david vilanova wrote:<br>
> Hello,<br>
> I have installed latest 7.11 release and my node is shown as down.<br>
> I hava a single physical server with 12 cores so not sure the conf below is<br>
> correct ?? can you help ??<br>
><br>
> In slurm.conf the node is configure as follows:<br>
><br>
> NodeName=linuxcluster CPUs=1 RealMemory=991 Sockets=12 CoresPerSocket=1<br>
> ThreadsPerCore=1 Feature=local<br>
<br>
12 Sockets? Certainly not... 12 Cores per socket, yes.<br>
(IIRC CPUS shouldn't be specified if the detailed topology is given.<br>
You may try CPUs=12 and drop the details.)<br>
<br>
> PartitionName=testq Nodes=inuxcluster Default=YES MaxTime=INFINITE State=UP<br>
^^ typo?<br>
<br>
Cheers,<br>
Steffen<o:p></o:p></p>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</body>
</html>