<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Menlo-Regular;
panose-1:0 0 0 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">I have a cluster, where I submit a bunch (600) jobs, but the cluster only runs about 20 at a time. By using pestat, I can see there are a bunch of systems with plenty of available cpu and memory.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-left:28.0pt;text-autospace:none"><span style="font-family:"Courier New";color:black">Hostname Partition Node Num_CPU CPUload Memsize Freemem
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt;text-autospace:none"><span style="font-family:"Courier New";color:black"> State Use/Tot (MB) (MB)
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp13 batch* idle 0 72 8.19* 258207 202456
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp14 batch* idle 0 72 0.00 258207 206558
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp16 batch* idle 0 72 0.05 258207 230609
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp17 batch* idle 0 72 8.51* 258207 184492
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp18 batch mix 14 72 0.29* 258207 230575
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp19 batch* idle 0 72 10.11* 258207 179604
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp20 batch* idle 0 72 9.56* 258207 180961
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp21 batch* idle 0 72 0.10 258207 227255
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp25 batch* idle 0 72 0.07 258207 218035
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp26 batch* idle 0 72 0.03 258207 226489
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp27 batch* idle 0 72 0.25 258207 228580
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp28 batch* idle 0 72 8.15* 258207 184306
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:28.0pt"><span style="font-family:"Courier New";color:black"> pcomp29 batch mix 2 72 0.01* 258207 226256
</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">How can I tell why jobs aren't running? "scontrol show job 123456" shows "<span style="font-family:Menlo-Regular;color:black">JobState=PENDING Reason=Priority" which doesn't shed any light on the situation for me. The pending jobs have
requested 1 cpu each and 2G of memory.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:Menlo-Regular;color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:Menlo-Regular;color:black">Should I just restart slurm daemons? Or is there some way for me to see why these systems aren't running jobs?</span><o:p></o:p></p>
</div>
</body>
</html>