[slurm-users] Getting current memory size of a job

Jeffrey Frey frey at udel.edu
Mon Apr 1 14:10:01 UTC 2019


If you're on Linux and using Slurm cgroups, your job processes should be contained in a memory cgroup.  The /proc/<pid>/cgroup file indicates to which cgroups a process is assigned, so:


$ srun [...] /bin/bash -c "grep memory: /proc/\$\$/cgroup | sed 's%^[0-9]*:memory:%/sys/fs/cgroup/memory%'"
/sys/fs/cgroup/memory/slurm/uid_1001/job_459890/step_0/task_0


The 'sed' above assumes cgroups are mounted under /sys/fs/cgroup.  Check the usage by examining the "memory.usage_in_bytes" file; here's a script that includes a Bash function to backtrack to the "job_#" directory in order to get the overall usage for all steps/tasks on the node for the job:


#!/bin/bash

function cgroup_job_dir() {
    local CGROUP_DIR="$1"
    while [[ $CGROUP_DIR != '/sys/fs/cgroup' && ! $CGROUP_DIR =~ /job_[0-9]+$ ]]; do
        CGROUP_DIR="$(dirname "$CGROUP_DIR")"
    done
    if [[ $CGROUP_DIR != '/sys/fs/cgroup' ]]; then
        echo "$CGROUP_DIR"
        return 0
    fi
    return 1
}

CGROUP_MEMORY="$(grep memory: /proc/$$/cgroup | sed 's%^[0-9]*:memory:%/sys/fs/cgroup/memory%')"
printf "%s: %s\n" "${CGROUP_MEMORY}/memory.usage_in_bytes" "$(cat "${CGROUP_MEMORY}/memory.usage_in_bytes")"

CGROUP_MEMORY="$(cgroup_job_dir "$CGROUP_MEMORY")"
if [ $? -eq 0 ]; then
    printf "%s: %s\n" "${CGROUP_MEMORY}/memory.usage_in_bytes" "$(cat "${CGROUP_MEMORY}/memory.usage_in_bytes")"
fi


E.g. when using this with srun:


$ srun [...] check_mem.sh
/sys/fs/cgroup/memory/slurm/uid_1001/job_459900/step_0/task_0/memory.usage_in_bytes: 462848
/sys/fs/cgroup/memory/slurm/uid_1001/job_459900/memory.usage_in_bytes: 610304


or with sbatch:


$ sbatch [...] check_mem.sh
Submitted batch job 459903
   :
$ cat slurm-459903.out
/sys/fs/cgroup/memory/slurm/uid_1001/job_459903/step_batch/task_0/memory.usage_in_bytes: 466944
/sys/fs/cgroup/memory/slurm/uid_1001/job_459903/memory.usage_in_bytes: 614400





> On Apr 1, 2019, at 2:28 AM, Bjørn-Helge Mevik <b.h.mevik at usit.uio.no> wrote:
> 
> If the job is alone on its node(s), you can use "scontrol show node <nodes>" and
> look at "RealMemory minus FreeMem".
> 
> --
> B/H


::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190401/8ce939ae/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190401/8ce939ae/attachment-0001.sig>


More information about the slurm-users mailing list