[slurm-users] seff: incorrect memory usage (18.08.5-2)

Tue Feb 26 14:32:57 UTC 2019

Hi Loris,

Odd, we never saw that issue with memory efficiency being out of whack, just the cpu efficiency. We are running 18.08.5-2 and here is a 512 core job run last night:

Job ID: 18096693
Array Job ID: 18096693_5
Cluster: monsoon
User/Group: abc123/cluster
State: COMPLETED (exit code 0)
Nodes: 60
Cores per node: 8
CPU Utilized: 01:34:06
CPU Efficiency: 58.04% of 02:42:08 core-walltime
Job Wall-clock time: 00:00:19
Memory Utilized: 36.04 GB (estimated maximum)
Memory Efficiency: 30.76% of 117.19 GB (1.95 GB/node

What job collection, task, and proc track plugin are you using I'm curious? We are using:

JobAcctGatherType=jobacct_gather/cgroup
TaskPlugin=task/cgroup,task/affinity
ProctrackType=proctrack/cgroup

Also cgroup.conf:

ConstrainCores=yes
ConstrainRAMSpace=yes

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

On 2/26/19, 2:15 AM, "slurm-users on behalf of Loris Bennett" <slurm-users-bounces at lists.schedmd.com on behalf of loris.bennett at fu-berlin.de> wrote:

    Hi,

    With seff 18.08.5-2 we have been getting spurious results regarding
    memory usage:

      $ seff 1230_27
      Job ID: 1234
      Array Job ID: 1230_27
      Cluster: curta
      User/Group: xxxxxxxxx/xxxxxxxxx
      State: COMPLETED (exit code 0)
      Nodes: 4
      Cores per node: 25
      CPU Utilized: 9-16:49:18
      CPU Efficiency: 30.90% of 31-09:35:00 core-walltime
      Job Wall-clock time: 07:32:09
      Memory Utilized: 48.00 EB (estimated maximum)
      Memory Efficiency: 26388279066.62% of 195.31 GB (1.95 GB/core)

    It seems that the more cores are involved the worse the overcalulation
    is, but not linearly.

    Has anyone else seen this?

    Cheers,

    Loris

    -- 
    Dr. Loris Bennett (Mr.)
    ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de