<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
Hi, Xand:</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
How does adding "ReqMem" to the sacct change the output?</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
E.g. on my cluster running Slurm 20.02.7 (on RHEL8), our GPU nodes have TRESBillingWeights=CPU=0,Mem=0,GRES/gpu=43:</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<span style="color: rgb(0, 0, 0); font-family: "Courier New", monospace; font-size: 12pt;">$ sacct --format=JobID%25,State,AllocTRES%50,ReqTRES,ReqMem,ReqCPUS|grep RUNNING</span></div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<span style="color: rgb(0, 0, 0); font-family: "Courier New", monospace; font-size: 12pt;"><br>
</span></div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<span style="color: rgb(0, 0, 0); font-family: "Courier New", monospace; font-size: 12pt;">                    JobID      State                                          AllocTRES    ReqTRES     ReqMem  ReqCPUS<br>
------------------------- ---------- -------------------------------------------------- ---------- ---------- --------<br>
</span></div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<span style="color: rgb(0, 0, 0); font-family: "Courier New", monospace; font-size: 12pt;">            2512977.batch    RUNNING                                cpu=48,mem=0,node=1                    0n       48</span><br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<div>           2512977.extern    RUNNING             billing=516,cpu=144,gres/gpu=12,node=3                    0n      144</div>
<div>                2512977.0    RUNNING     cpu=24,gres/gpu:v100=8,gres/gpu=8,mem=0,node=2                    0n       24</div>
                  2513020    RUNNING             billing=516,cpu=144,gres/gpu=12,node=3 billing=5+         0n      144<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
I.e. note the "mem=0", and absence of the mem field on some of those lines. In squeue:</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
       JOBID PART             NAME     USER    STATE       TIME  TIME_LIMIT  NODES MIN_MEMO NODELIST(REASON)<br>
     2512977  gpu 1AB_96DMPCLoose_    ba553  RUNNING   22:29:20  1-00:00:00      3        0 gpu[001,003-004]<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
In comparison, a job on our def partition which requests a specific amount of mem:</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
(sacct)</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
                    JobID      State                                          AllocTRES    ReqTRES     ReqMem  ReqCPUS<br>
------------------------- ---------- -------------------------------------------------- ---------- ---------- --------<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
                  2514854    RUNNING                     billing=1,cpu=1,mem=36G,node=1 billing=1+       36Gn        1
<div>            2514854.batch    RUNNING                               cpu=1,mem=36G,node=1                  36Gn        1</div>
           2514854.extern    RUNNING                     billing=1,cpu=1,mem=36G,node=1                  36Gn        1<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
and the squeue line:</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
       JOBID PART             NAME     USER    STATE       TIME  TIME_LIMIT  NODES MIN_MEMO NODELIST(REASON)<br>
     2514854  def ClusterJobStart_ sbradley  RUNNING    5:05:27     8:00:00      1      36G node003<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div>
<div style="font-family:"Courier New",monospace; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div id="Signature">
<div>
<div></div>
<div></div>
<div></div>
<div></div>
<div></div>
<div id="divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:'Courier New',monospace">
<div class="BodyFragment"><font size="2"><span style="font-size:10pt">
<div class="PlainText"></div>
<div class="PlainText" style="font-family:"Courier New",monospace; font-size:13.3333px">
</div>
<span id="ms-rterangepaste-start"></span>
<div>--</div>
<div>
<div>David Chin, PhD (he/him)   Sr. SysAdmin, URCF, Drexel</div>
<div>dwc62@drexel.edu                     215.571.4335 (o)</div>
<div>For URCF support: urcf-support@drexel.edu</div>
<div>https://proteusmaster.urcf.drexel.edu/urcfwiki</div>
<div>github:prehensilecode</div>
</div>
<div class="PlainText"><span></span></div>
</span></font></div>
</div>
</div>
</div>
</div>
<div id="appendonsend"></div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Xand Meaden <xand.meaden@kcl.ac.uk><br>
<b>Sent:</b> Wednesday, January 12, 2022 12:23<br>
<b>To:</b> slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> [slurm-users] Memory usage not tracked</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="PlainText">External.<br>
<br>
Hi,<br>
<br>
We wish to record memory usage of HPC jobs, but with Slurm 20.11 cannot<br>
get this to work - the information is simply missing. Our two older<br>
clusters with Slurm 19.05 will record memory usage as a TRES, e.g. as<br>
shown below:<br>
<br>
# sacct --format=JobID,State,AllocTRES%32|grep RUNNING|head -4<br>
14029267        RUNNING billing=32,cpu=32,mem=185600M,n+<br>
14037739        RUNNING billing=64,cpu=64,mem=250G,node+<br>
14037739.ba+    RUNNING           cpu=32,mem=125G,node=1<br>
14037739.0      RUNNING           cpu=1,mem=4000M,node=1<br>
<br>
However with 20.11 we see no memory usage:<br>
<br>
# sacct --format=JobID,State,AllocTRES%32|grep RUNNING|head -4<br>
771             RUNNING         billing=36,cpu=36,node=1<br>
771.batch       RUNNING              cpu=36,mem=0,node=1<br>
816             RUNNING       billing=128,cpu=128,node=1<br>
823             RUNNING         billing=36,cpu=36,node=1<br>
<br>
I've also checked within the slurm database's cluster_job_table, and<br>
tres_alloc has no "2=" (memory) value for any job.<br>
<br>
>From my reading of <a href="https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Ftres.html&amp;data=04%7C01%7Cdwc62%40drexel.edu%7C98efffa860f64c58bfa408d9d5f03fe4%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C1%7C637776050108044394%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=rlTTua04KSGUUrK7X8%2FJ7ce1tLv5%2BrdfIkvpSc%2BxsRw%3D&amp;reserved=0" data-auth="NotApplicable">
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Ftres.html&amp;data=04%7C01%7Cdwc62%40drexel.edu%7C98efffa860f64c58bfa408d9d5f03fe4%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C1%7C637776050108044394%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=rlTTua04KSGUUrK7X8%2FJ7ce1tLv5%2BrdfIkvpSc%2BxsRw%3D&amp;reserved=0</a>
 it's not possible<br>
to disable memory as a TRES, so I can't figure out what I'm missing<br>
here. The 20.11 cluster is running on Ubuntu 20.04 (vs CentOS 7 for the<br>
others), in case that makes any difference!<br>
<br>
Thanks in advance,<br>
Xand<br>
<br>
</div>
</span></font></div>
<br>
<p style="font-family:Calibri;font-size:10pt;color:#000000;margin:5pt;" align="Left">
Drexel Internal Data<br>
</p>
</body>
</html>