<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>

</head>

<body dir="ltr">

<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">

<p>Dear Daryl,</p>

<p><br>

</p>

<p>I once posed the same question, and got a dear answer here in the forum some while ago. So, I just forward it approximately.</p>

<p><br>

</p>

<p>RSS appears to include double counting of memory that is occupied by shared libraries. I was proposed to switch to PSS</p>

<p><a href="https://slurm.schedmd.com/slurm.conf.html#OPT_JobAcctGatherParams" class="OWAAutoLink">https://slurm.schedmd.com/slurm.conf.html#OPT_JobAcctGatherParams</a></p>

<p><a href="https://www.sobyte.net/post/2022-04/pss-uss-rss/" class="OWAAutoLink">https://www.sobyte.net/post/2022-04/pss-uss-rss/</a></p>

<p><br>

</p>

<p>Hope this helps any further (and I hope I understood the answer to my question thus correctly ;) ).<br>

</p>

<p><br>

</p>

<p>Kind regards.</p>

<p>Martin<br>

</p>

<p><br>

</p>

<br>

<br>

<div style="color: rgb(0, 0, 0);">

<div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>Von:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> im Auftrag von Daryl Roche <droche@bcchr.ca><br>

<b>Gesendet:</b> Freitag, 16. Dezember 2022 01:43<br>

<b>An:</b> 'slurm-users@lists.schedmd.com'<br>

<b>Betreff:</b> [slurm-users] seff MaxRSS Above 100 Percent?</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">Hey All, <br>

<br>

I was just hoping to find out if anyone can explain how a job running on a single node was able to have a MaxRSS of 240% reported by seff. Below is some specifics about the job that was run. We're using slurm 19.05.7 on CentOS 8.2/<br>

.<br>

[root@hpc-node01 ~]# scontrol show jobid -dd 97036<br>

JobId=97036 JobName=jobbie.sh<br>

   UserId=username(012344321) GroupId=domain users(214400513) MCS_label=N/A<br>

   Priority=4294842062 Nice=0 Account=(null) QOS=normal<br>

   JobState=COMPLETED Reason=None Dependency=(null)<br>

   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0<br>

   DerivedExitCode=0:0<br>

   RunTime=00:22:35 TimeLimit=8-08:00:00 TimeMin=N/A<br>

   SubmitTime=2022-12-02T14:27:47 EligibleTime=2022-12-02T14:27:48<br>

   AccrueTime=2022-12-02T14:27:48<br>

   StartTime=2022-12-02T14:27:48 EndTime=2022-12-02T14:50:23 Deadline=N/A<br>

   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-12-02T14:27:48<br>

   Partition=defq AllocNode:Sid=hpc-node01:3921213<br>

   ReqNodeList=(null) ExcNodeList=(null)<br>

   NodeList=hpc-node11<br>

   BatchHost=hpc-node11<br>

   NumNodes=1 NumCPUs=40 NumTasks=0 CPUs/Task=40 ReqB:S:C:T=0:0:*:*<br>

   TRES=cpu=40,mem=350G,node=1,billing=40<br>

   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*<br>

     Nodes=hpc-node11 CPU_IDs=0-79 Mem=358400 GRES=<br>

   MinCPUsNode=40 MinMemoryNode=350G MinTmpDiskNode=0<br>

   Features=(null) DelayBoot=00:00:00<br>

   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)<br>

   Command=/path/to/scratch/jobbie.sh<br>

   WorkDir=/path/to/scratch<br>

   StdErr=/path/to/scratch/jobbie.sh-97036.error<br>

   StdIn=/dev/null<br>

   StdOut=/path/to/scratch/jobbie.sh-97036.out<br>

   Power=<br>

<br>

[root@hpc-node01 ~]# seff 97036<br>

Job ID: 97036<br>

Cluster: slurm<br>

Use of uninitialized value $user in concatenation (.) or string at /cm/shared/apps/slurm/current/bin/seff line 154, <DATA> line 604.<br>

User/Group: /domain users<br>

State: COMPLETED (exit code 0)<br>

Nodes: 1<br>

Cores per node: 40<br>

CPU Utilized: 04:43:36<br>

CPU Efficiency: 31.39% of 15:03:20 core-walltime Job Wall-clock time: 00:22:35 Memory Utilized: 840.21 GB Memory Efficiency: 240.06% of 350.00 GB<br>

<br>

[root@hpc-node11 ~]# free -m<br>

              total        used        free      shared  buff/cache   available<br>

Mem:         385587        2761      268544        7891      114281      371865<br>

Swap:             0           0           0<br>

<br>

<br>

[root@hpc-node11 ~]# cat /path/to/scratch/jobbie.sh #!/bin/bash<br>

  <br>

#SBATCH --mail-user=username@bcchr.ca<br>

#SBATCH --mail-type=ALL<br>

<br>

## CPU Usage<br>

#SBATCH --mem=350G<br>

#SBATCH --cpus-per-task=40<br>

#SBATCH --time=200:00:00<br>

#SBATCH --nodes=1<br>

<br>

## Output and Stderr<br>

#SBATCH --output=%x-%j.out<br>

#SBATCH --error=%x-%j.error<br>

<br>

<br>

source /path/to/tools/Miniconda3/opt/miniconda3/etc/profile.d/conda.sh<br>

conda activate nanomethphase<br>

<br>

<br>

# Working dir<br>

Working_Dir=/path/to/scratch/<br>

<br>

# Case methyl sample<br>

Case=$Working_Dir/Case/methylation_frequency.tsv<br>

<br>

# Control, here using a dir with both parents Control=$Working_Dir/Control/<br>

<br>

# DMA call<br>

/path/to/tools/NanoMethPhase/NanoMethPhase/nanomethphase.py  dma \<br>

    --case $Case \<br>

    --control $Control \<br>

    --columns 1,2,5,7 \<br>

    --out_prefix DH0808_Proband_vs_Controls_DMA \<br>

    --out_dir $Working_Dir<br>

<br>

<br>

<br>

<br>

Daryl Roche<br>

<br>

</div>

</span></font></div>

</div>

</body>

</html>