[slurm-users] Getting current memory size of a job
Jeffrey Frey
frey at udel.edu
Mon Apr 1 14:10:01 UTC 2019
If you're on Linux and using Slurm cgroups, your job processes should be contained in a memory cgroup. The /proc/<pid>/cgroup file indicates to which cgroups a process is assigned, so:
$ srun [...] /bin/bash -c "grep memory: /proc/\$\$/cgroup | sed 's%^[0-9]*:memory:%/sys/fs/cgroup/memory%'"
/sys/fs/cgroup/memory/slurm/uid_1001/job_459890/step_0/task_0
The 'sed' above assumes cgroups are mounted under /sys/fs/cgroup. Check the usage by examining the "memory.usage_in_bytes" file; here's a script that includes a Bash function to backtrack to the "job_#" directory in order to get the overall usage for all steps/tasks on the node for the job:
#!/bin/bash
function cgroup_job_dir() {
local CGROUP_DIR="$1"
while [[ $CGROUP_DIR != '/sys/fs/cgroup' && ! $CGROUP_DIR =~ /job_[0-9]+$ ]]; do
CGROUP_DIR="$(dirname "$CGROUP_DIR")"
done
if [[ $CGROUP_DIR != '/sys/fs/cgroup' ]]; then
echo "$CGROUP_DIR"
return 0
fi
return 1
}
CGROUP_MEMORY="$(grep memory: /proc/$$/cgroup | sed 's%^[0-9]*:memory:%/sys/fs/cgroup/memory%')"
printf "%s: %s\n" "${CGROUP_MEMORY}/memory.usage_in_bytes" "$(cat "${CGROUP_MEMORY}/memory.usage_in_bytes")"
CGROUP_MEMORY="$(cgroup_job_dir "$CGROUP_MEMORY")"
if [ $? -eq 0 ]; then
printf "%s: %s\n" "${CGROUP_MEMORY}/memory.usage_in_bytes" "$(cat "${CGROUP_MEMORY}/memory.usage_in_bytes")"
fi
E.g. when using this with srun:
$ srun [...] check_mem.sh
/sys/fs/cgroup/memory/slurm/uid_1001/job_459900/step_0/task_0/memory.usage_in_bytes: 462848
/sys/fs/cgroup/memory/slurm/uid_1001/job_459900/memory.usage_in_bytes: 610304
or with sbatch:
$ sbatch [...] check_mem.sh
Submitted batch job 459903
:
$ cat slurm-459903.out
/sys/fs/cgroup/memory/slurm/uid_1001/job_459903/step_batch/task_0/memory.usage_in_bytes: 466944
/sys/fs/cgroup/memory/slurm/uid_1001/job_459903/memory.usage_in_bytes: 614400
> On Apr 1, 2019, at 2:28 AM, Bjørn-Helge Mevik <b.h.mevik at usit.uio.no> wrote:
>
> If the job is alone on its node(s), you can use "scontrol show node <nodes>" and
> look at "RealMemory minus FreeMem".
>
> --
> B/H
::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE 19716
Office: (302) 831-6034 Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190401/8ce939ae/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190401/8ce939ae/attachment-0001.sig>
More information about the slurm-users
mailing list