[slurm-users] seff: incorrect memory usage (18.08.5-2)
Loris Bennett
loris.bennett at fu-berlin.de
Tue Feb 26 15:12:29 UTC 2019
Hi Chris,
I had
JobAcctGatherType=jobacct_gather/linux
TaskPlugin=task/affinity
ProctrackType=proctrack/cgroup
ProctrackType was actually unset but cgroup is the default.
I have now changed the settings to
JobAcctGatherType=jobacct_gather/cgroup
TaskPlugin=task/affinity,task/cgroup
ProctrackType=proctrack/cgroup
and added
TaskAffinity=no
ConstrainCores=yes
ConstrainRAMSpace=yes
For at least one job this gives me the following for a running job:
$ seff -d 4896
Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus
Slurm data: 4896 loris sc RUNNING curta 8 2 2 2097152 0 0 33 3.6028797018964e+16 0
Job ID: 4896
Cluster: curta
User/Group: loris/sc
State: RUNNING
Nodes: 2
Cores per node: 4
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:04:24 core-walltime
Job Wall-clock time: 00:00:33
Memory Utilized: 32.00 EB (estimated maximum)
Memory Efficiency: 1717986918400.00% of 2.00 GB (256.00 MB/core)
WARNING: Efficiency statistics may be misleading for RUNNING jobs.
and this at completion:
$ seff -d 4896
Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus
Slurm data: 4896 loris sc COMPLETED curta 8 2 2 2097152 0 0 61 59400 0
Job ID: 4896
Cluster: curta
User/Group: loris/sc
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 4
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:08:08 core-walltime
Job Wall-clock time: 00:01:01
Memory Utilized: 58.01 MB (estimated maximum)
Memory Efficiency: 2.83% of 2.00 GB (256.00 MB/core)
which looks good. I'll see how it goes with longer running job.
Thanks for the input,
Loris
Christopher Benjamin Coffey <Chris.Coffey at nau.edu> writes:
> Hi Loris,
>
> Odd, we never saw that issue with memory efficiency being out of whack, just the cpu efficiency. We are running 18.08.5-2 and here is a 512 core job run last night:
>
> Job ID: 18096693
> Array Job ID: 18096693_5
> Cluster: monsoon
> User/Group: abc123/cluster
> State: COMPLETED (exit code 0)
> Nodes: 60
> Cores per node: 8
> CPU Utilized: 01:34:06
> CPU Efficiency: 58.04% of 02:42:08 core-walltime
> Job Wall-clock time: 00:00:19
> Memory Utilized: 36.04 GB (estimated maximum)
> Memory Efficiency: 30.76% of 117.19 GB (1.95 GB/node
>
> What job collection, task, and proc track plugin are you using I'm curious? We are using:
>
> JobAcctGatherType=jobacct_gather/cgroup
> TaskPlugin=task/cgroup,task/affinity
> ProctrackType=proctrack/cgroup
>
> Also cgroup.conf:
>
> ConstrainCores=yes
> ConstrainRAMSpace=yes
>
> Best,
> Chris
>
> —
> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
>
>
> On 2/26/19, 2:15 AM, "slurm-users on behalf of Loris Bennett" <slurm-users-bounces at lists.schedmd.com on behalf of loris.bennett at fu-berlin.de> wrote:
>
> Hi,
>
> With seff 18.08.5-2 we have been getting spurious results regarding
> memory usage:
>
> $ seff 1230_27
> Job ID: 1234
> Array Job ID: 1230_27
> Cluster: curta
> User/Group: xxxxxxxxx/xxxxxxxxx
> State: COMPLETED (exit code 0)
> Nodes: 4
> Cores per node: 25
> CPU Utilized: 9-16:49:18
> CPU Efficiency: 30.90% of 31-09:35:00 core-walltime
> Job Wall-clock time: 07:32:09
> Memory Utilized: 48.00 EB (estimated maximum)
> Memory Efficiency: 26388279066.62% of 195.31 GB (1.95 GB/core)
>
> It seems that the more cores are involved the worse the overcalulation
> is, but not linearly.
>
> Has anyone else seen this?
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
>
>
>
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list