<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-2022-jp">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Slurm Users, <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I am hoping that you all can help me with the problem below.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We just spun up a new cluster using Bright and have been trying to change the default behavior of slurm from linear to con_res. Should be simple enough but I am plagued by the following error:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">error: we don't have select plugin type 102<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Both the select_linear.so and select_cons_res.so are located in /cm/shared_tmp/apps/slurm/17.11.8/lib64/slurm/<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I have been testing with just the compute nodes and not the GPU nodes etc... I added the following to my slurm.conf file:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"># Scheduler<o:p></o:p></p>
<p class="MsoNormal">SchedulerType=sched/backfill<o:p></o:p></p>
<p class="MsoNormal"><span style="background:yellow;mso-highlight:yellow">SelectType=select/cons_res<o:p></o:p></span></p>
<p class="MsoNormal"><span style="background:yellow;mso-highlight:yellow">SelectTypeParameters=CR_Core</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"># Nodes<o:p></o:p></p>
<p class="MsoNormal"># NodeName=big-mem[001-005],node[001-056] # Entry from default install<o:p></o:p></p>
<p class="MsoNormal"># NodeName=gpu[001-004] Gres=gpu:2 # Entry from default install<o:p></o:p></p>
<p class="MsoNormal"><span style="background:yellow;mso-highlight:yellow">NodeName=node[001-056] CPUs=2 RealMemory=196000 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 State=UNKNOWN</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"># Partitions<o:p></o:p></p>
<p class="MsoNormal">PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=<span style="background:yellow;mso-highlight:yellow">YES</span> GraceTime=0 PreemptMode=OFF ReqResv=NO
AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpu[001-004],big-mem[001-005],node[001-056]<o:p></o:p></p>
<p class="MsoNormal">PartitionName=test Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=<span style="background:yellow;mso-highlight:yellow">YES</span> GraceTime=0 PreemptMode=OFF ReqResv=NO
AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-056]<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When I issue the scontrol reconfigure I get the following:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">[root@thunder ~]# scontrol reconfigure<o:p></o:p></p>
<p class="MsoNormal">slurm_reconfigure error: Unable to contact slurm controller (connect failure)<o:p></o:p></p>
<p class="MsoNormal">[root@thunder ~]# systemctl status slurmctld.service<o:p></o:p></p>
<p class="MsoNormal">● slurmctld.service - Slurm controller daemon<o:p></o:p></p>
<p class="MsoNormal"> Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor preset: disabled)<o:p></o:p></p>
<p class="MsoNormal"> Active: failed (Result: exit-code) since Thu 2018-12-13 08:46:18 CST; 5s ago<o:p></o:p></p>
<p class="MsoNormal"> Process: 31416 ExecStart=/cm/shared/apps/slurm/17.11.8/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)<o:p></o:p></p>
<p class="MsoNormal">Main PID: 31418 (code=exited, status=1/FAILURE)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When I revert the changes, it goes back to an active working state.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The /var/log/slurmctld log shows this erorr message:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">error: we don't have select plugin type 102<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Has anyone else run into this problem? If so, can you recommend a fix?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks, <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Chad<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>