<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Patrick
<div class="">Hi Ray</div>
<div class=""><br class="">
</div>
<div class="">Happy Friday!</div>
<div class="">Thank you both for your quick reply. This is what i found out.</div>
<div class=""><br class="">
</div>
<div class="">With Patrick one liner it works fine.</div>
<div class="">NodeName=radonc[01-04] CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2</div>
<div class=""><br class="">
</div>
<div class="">With Ray suggestion i have a error message for each nodes. Here i am giving you only one error message from a node.</div>
<div class="">
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">sacct: error: NodeNames=radonc01 CPUs=32 doesn't match Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CPUs</span></div>
</div>
<div class=""><span style="font-variant-ligatures: no-common-ligatures" class="">The interesting thing is if you follow the </span><span style="font-family: Menlo; font-size: 11px; background-color: rgb(255, 255, 255);" class="">Sockets*CoresPerSocket*ThreadsPerCore
</span><span style="background-color: rgb(255, 255, 255);" class="">formula 2x8x2 = 32 however look above and it says (16) - Strange, no ?</span></div>
<div class=""><span style="background-color: rgb(255, 255, 255);" class="">Aslo, as Ray suggested </span><span style="font-family: Menlo; font-size: 11px; background-color: rgb(255, 255, 255);" class="">NodeAddr=10.112.0.5,10.112.0.6,10.112.0.14,10.112.0.16
</span><span style="background-color: rgb(255, 255, 255);" class="">comma between IP </span>works fine.</div>
<div class=""><br class="">
</div>
<div class="">So for now I will stay with Patrick’s one-liner. Although this solution did not give any error messages i am still worried that SLURM stills think that <span style="font-family: Menlo; font-size: 11px; background-color: rgb(255, 255, 255);" class="">Sockets*CoresPerSocket*ThreadsPerCore
(16)</span></div>
<div class=""><br class="">
</div>
<div class="">FYI: Also, the /etc/hosts file on each machine (master and execute nodes) looks like this one.</div>
<div class="">
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">0.112.0.25
<a href="http://radoncmaster.stanford.EDU" class="">radoncmaster.stanford.EDU</a> radoncmaster</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">10.112.0.5
<a href="http://radonc01.stanford.EDU" class="">radonc01.stanford.EDU</a> radonc01</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">10.112.0.6
<a href="http://radonc02.stanford.EDU" class="">radonc02.stanford.EDU</a> radonc02</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">10.112.0.14
<a href="http://radonc03.stanford.EDU" class="">radonc03.stanford.EDU</a> radonc03</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">10.112.0.16
<a href="http://radonc04.stanford.EDU" class="">radonc04.stanford.EDU</a> radonc04</span></div>
</div>
<div class=""><br class="">
</div>
<div class="">Now, when i run sacct it says </div>
<div class="">
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; background-color: rgb(255, 255, 255);" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">SLURM accounting storage is disabled</span></div>
</div>
<div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class="">
which i am ok since i have only two pos-doc at the moment.</div>
<div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class="">
<br class="">
</div>
<div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class="">
How can I test my cluster with a sample job and make sure it uses all the CPUs and ram?</div>
<div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class="">
<br class="">
</div>
<div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class="">
Thank you for your help and patience with me</div>
<div class=""><br class="">
</div>
<div class="">Best,</div>
<div class="">Eric</div>
<div class="">
<div class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div style="text-align: -webkit-auto; orphans: 2; widows: 2; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div style="orphans: auto; widows: auto;" class=""><span style="text-align: -webkit-auto; background-color: rgba(255, 255, 255, 0);" class="">_____________________________________________________________________________________________________</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><br class="">
</span></div>
<span style="background-color: rgba(255, 255, 255, 0);" class=""><b class="">
<div style="orphans: auto; widows: auto;" class=""><b style="text-align: -webkit-auto;" class="">Eric F. Alemany</b></div>
</b>
<div style="orphans: auto; widows: auto;" class=""><i style="text-align: -webkit-auto;" class="">System Administrator for Research</i></div>
</span>
<div style="orphans: auto; widows: auto;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><br class="">
</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="text-align: -webkit-auto; background-color: rgba(255, 255, 255, 0);" class="">Division of Radiation & Cancer Biology</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="text-align: -webkit-auto; background-color: rgba(255, 255, 255, 0);" class="">Department of Radiation Oncology</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><br class="">
</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="text-align: -webkit-auto; background-color: rgba(255, 255, 255, 0);" class="">Stanford University School of Medicine</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="text-align: -webkit-auto; background-color: rgba(255, 255, 255, 0);" class="">Stanford, California 94305</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><br class="">
</span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><font style="text-align: -webkit-auto;" class="">Tel:</font><a href="tel:1-650-498-7969" x-apple-data-detectors="true" x-apple-data-detectors-type="telephone" x-apple-data-detectors-result="1" style="text-align: -webkit-auto;" class="">1-650-498-7969</a><font style="text-align: -webkit-auto;" class="">
No Texting</font></span></div>
<div style="orphans: auto; widows: auto;" class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><font style="text-align: -webkit-auto;" class="">Fax:</font><a href="tel:1-650-723-7382" x-apple-data-detectors="true" x-apple-data-detectors-type="telephone" x-apple-data-detectors-result="2" style="text-align: -webkit-auto;" class="">1-650-723-7382</a></span></div>
<div style="orphans: auto; widows: auto;" class=""><br class="">
</div>
</div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<font style="background-color: rgba(255, 255, 255, 0);" class=""></font></div>
</div>
</div>
<br class="Apple-interchange-newline">
</div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On May 4, 2018, at 6:14 AM, Patrick Goetz <<a href="mailto:pgoetz@math.utexas.edu" class="">pgoetz@math.utexas.edu</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">I concur with this. Make sure your nodes are in the /etc/hosts file on the SMS. Also, if you name them by base + numerical sequence, you can configure them with a single line in Slurm (using the example below):<br class="">
<br class="">
NodeName=radonc[01-04] CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2<br class="">
<br class="">
On 05/04/2018 12:05 AM, Raymond Wan wrote:<br class="">
<blockquote type="cite" class="">Hi Eric,<br class="">
On Fri, May 4, 2018 at 6:04 AM, Eric F. Alemany <<a href="mailto:ealemany@stanford.edu" class="">ealemany@stanford.edu</a>> wrote:<br class="">
<blockquote type="cite" class=""># COMPUTE NODES<br class="">
NodeName=radonc[01-04] NodeAddr=10.112.0.5 10.112.0.6 10.112.0.14<br class="">
10.112.0.16 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8<br class="">
ThreadsPerCore=2 State=UNKNOWN<br class="">
PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE<br class="">
State=UP<br class="">
</blockquote>
I don't know what is the problem, but my *guess* based on my own<br class="">
configuration file is that we have one node per line under "NodeName".<br class="">
We also don't have NodeAddr but maybe that's ok. This means the IP<br class="">
addresses of the nodes in our cluster are hard-coded in /etc/hosts.<br class="">
Also, State is not given.<br class="">
So, if I formatted your's to look line our's would look something like:<br class="">
NodeName=radonc01 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8<br class="">
ThreadsPerCore=2<br class="">
NodeName=radonc02 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8<br class="">
ThreadsPerCore=2<br class="">
NodeName=radonc03 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8<br class="">
ThreadsPerCore=2<br class="">
NodeName=radonc04 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8<br class="">
ThreadsPerCore=2<br class="">
PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE State=UP<br class="">
Maybe the problem is with the NodeAddr because you might have to<br class="">
separate the values with a comma instead of a space? With spaces, it<br class="">
might have problems parsing?<br class="">
That's my guess...<br class="">
Ray<br class="">
</blockquote>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>