[slurm-users] [slurm-dev] SLURM 16.05.10-2 jobacct_gather/linux inconsistencies?
John DeSantis
desantis at usf.edu
Tue Dec 12 11:11:11 MST 2017
Hello all,
> A closer look revealed that jag_common_poll_data isn't consistent in
> logging the total memory usage, depending on the application in
> question.
Ugh. I'm embarrassed to say that the issue was tracked back to
incorrectly using proctrack/pgid vs. proctrack/linuxproc; sadly this
is an artifact from our testing days - and was never changed after our
move to production!
RTFM, dude!
John DeSantis
On Tue, 5 Sep 2017 11:40:15 -0600
John DeSantis <desantis at usf.edu> wrote:
>
> Hello all,
>
> We were recently alerted by a user whose long running jobs (>= 6
> days) were killed by oom.
>
> A closer look revealed that jag_common_poll_data isn't consistent in
> logging the total memory usage, depending on the application in
> question. For example, consider the following job submission:
>
> #!/bin/bash
> #
> #SBATCH --job-name=sif14_1co2
> #SBATCH --output=runlog.log
> #SBATCH --time=48:00:00
> #SBATCH --nodes=8 --ntasks-per-node=6
> #SBATCH --partition=p2016 --qos=p16
> #SBATCH --mail-type=END
> #SBATCH --mail-user=user
>
> module purge
> module add apps/vasp/5.4.1
>
> # For standard vasp
> mpirun vasp_std
>
>
> Since no memory syntax was offered, the DefMemPerCPU value would take
> over, which is 512 MB.
>
> Looking at the batch host stats, here is what is actually being
> reported:
>
> PPID LWP PSR TIME S RSS ELAPSED COMMAND VSZ
> 28388 28394 1 00:00:00 S 1332 17:47:24 slurm_script 106116
> 28394 28403 1 00:00:00 S 1404 17:47:24 mpirun 106248
> 28403 28416 10 00:00:00 S 1664 17:47:24 mpiexec.hydra 17180
> 28416 28417 5 00:00:00 S 4996 17:47:24 srun 708424
> 28417 28421 3 00:00:00 S 668 17:47:24 srun 37712
> 28453 28458 0 00:00:00 S 1928 17:47:24 pmi_proxy 16384
> 28458 28471 4 15:55:13 R 1059004 17:47:24 vasp_std 2953636
> 28458 28472 10 17:01:43 R 1057672 17:47:24 vasp_std 2950168
> 28458 28473 20 17:35:54 R 1055412 17:47:24 vasp_std 2947896
> 28458 28474 3 17:53:39 R 1060060 17:47:24 vasp_std 2952712
> 28458 28475 11 17:51:52 R 1055168 17:47:24 vasp_std 2947772
> 28458 28476 23 17:53:42 R 1063888 17:47:24 vasp_std 2956572
>
> [2017-09-05T13:11:09.124] [10212120] debug: jag_common_poll_data:
> Task average frequency = 2197 pid 28394 mem size 4400 229544 time
> 0.020000(0+0) [2017-09-05T13:11:09.280] [10212120.0] debug:
> jag_common_poll_data: Task average frequency = 2197 pid 28458 mem
> size 1928 205988 time 0.020000(0+0) [2017-09-05T13:11:31.729]
> [10212120] debug: jag_common_poll_data: Task average frequency =
> 2197 pid 28394 mem size 4400 229544 time 0.020000(0+0)
> [2017-09-05T13:11:31.757] [10212120.0] debug: jag_common_poll_data:
> Task average frequency = 2197 pid 28458 mem size 1928 205988 time
> 0.020000(0+0) [2017-09-05T13:11:39.126] [10212120] debug:
> jag_common_poll_data: Task average frequency = 2197 pid 28394 mem
> size 4400 229544 time 0.020000(0+0) [2017-09-05T13:11:39.283]
> [10212120.0] debug: jag_common_poll_data: Task average frequency =
> 2197 pid 28458 mem size 1928 205988 time 0.020000(0+0)
>
> However, if a slightly older version of the same software is used,
> memory seems to be reported correctly. Consider the submission
> script below.
>
> #!/bin/sh
>
> #SBATCH --time=04:00:00
> #SBATCH -N 2 --ntasks-per-node=2 -J "vasp_run"
> #SBATCH --mem-per-cpu=1800
> #SBATCH --reservation=blah
> #SBATCH -p blah
> #SBATCH -w nodes-[1-2]
>
> module purge
> module load apps/vasp/5.3.3
>
> mpirun vasp
>
> And, the reported stats:
>
> PPID LWP PSR TIME S RSS ELAPSED COMMAND
> 22046 22050 0 00:00:00 S 1324 03:32 slurm_script
> 22050 22053 0 00:00:00 S 1476 03:32 mpirun
> 22053 22111 1 00:00:00 S 1428 03:32 mpiexec.hydra
> 22111 22112 2 00:00:00 S 4800 03:32 srun
> 22112 22113 3 00:00:00 S 668 03:32 srun
> 22121 22126 3 00:00:00 S 1352 03:32 pmi_proxy
> 22126 22127 5 00:03:30 R 1781544 03:32 vasp
> 22126 22128 0 00:03:30 R 1778052 03:32 vasp
>
>
> [2017-09-05T13:17:58.097] [10215464] debug: jag_common_poll_data:
> Task average frequency = 2527 pid 22050 mem size 4228 230384 time
> 0.010000(0+0) [2017-09-05T13:17:58.265] [10215464.0] debug:
> jag_common_poll_data: Task average frequency = 2527 pid 22126 mem
> size 3538576 7561020 time 59.630000(56+2) [2017-09-05T13:18:28.101]
> [10215464] debug: jag_common_poll_data: Task average frequency =
> 2527 pid 22050 mem size 4228 230384 time 0.010000(0+0)
> [2017-09-05T13:18:28.270] [10215464.0] debug: jag_common_poll_data:
> Task average frequency = 2505 pid 22126 mem size 3560928 7655316 time
> 119.590000(116+3) [2017-09-05T13:18:58.105] [10215464] debug:
> jag_common_poll_data: Task average frequency = 2527 pid 22050 mem
> size 4228 230384 time 0.010000(0+0) [2017-09-05T13:18:58.275]
> [10215464.0] debug: jag_common_poll_data: Task average frequency =
> 2498 pid 22126 mem size 3560932 7655316 time 179.570000(175+3)
> [2017-09-05T13:19:28.109] [10215464] debug: jag_common_poll_data:
> Task average frequency = 2527 pid 22050 mem size 4228 230384 time
> 0.010000(0+0) [2017-09-05T13:19:28.281] [10215464.0] debug:
> jag_common_poll_data: Task average frequency = 2495 pid 22126 mem
> size 3560932 7655316 time 239.550000(234+4) [2017-09-05T13:19:58.113]
> [10215464] debug: jag_common_poll_data: Task average frequency =
> 2527 pid 22050 mem size 4228 230384 time 0.010000(0+0)
> [2017-09-05T13:19:58.286] [10215464.0] debug: jag_common_poll_data:
> Task average frequency = 2493 pid 22126 mem size 3560936 7655316 time
> 299.510000(294+5)
>
>
>
> Has anyone else noticed anything similar on their cluster(s)? I
> cannot confirm if this was happening before we upgraded from 15.08.4
> to 16.05.10-2.
>
> Thanks,
> John DeSantis
More information about the slurm-users
mailing list