[slurm-users] [Slurm 18.08.4] sacct/seff Inaccurate usercpu on Job Arrays
Chance Bryce Carl Nelson
chance-nelson at nau.edu
Fri Dec 21 12:11:28 MST 2018
Hi folks,
calling sacct with the usercpu flag enabled seems to provide cpu times far
above expected values for job array indices. This is also reported by seff.
For example, executing the following job script:
________________________________________________________
#!/bin/bash
#SBATCH --job-name=array_test
#SBATCH --workdir=/scratch/cbn35/bigdata
#SBATCH --output=/scratch/cbn35/bigdata/logs/job_%A_%a.log
#SBATCH --time=20:00
#SBATCH --array=1-5
#SBATCH -c2
srun stress -c 2 -m 1 --vm-bytes 500M --timeout 65s
________________________________________________________
...results in the following stats:
________________________________________________________
JobID ReqCPUS UserCPU Timelimit Elapsed
------------ -------- ---------- ---------- ----------
15730924_5 2 02:30:14 00:20:00 00:01:08
15730924_5.+ 2 00:00.004 00:01:08
15730924_5.+ 2 00:00:00 00:01:09
15730924_5.0 2 02:30:14 00:01:05
15730924_1 2 02:30:48 00:20:00 00:01:08
15730924_1.+ 2 00:00.013 00:01:08
15730924_1.+ 2 00:00:00 00:01:09
15730924_1.0 2 02:30:48 00:01:05
15730924_2 2 02:15:52 00:20:00 00:01:07
15730924_2.+ 2 00:00.007 00:01:07
15730924_2.+ 2 00:00:00 00:01:07
15730924_2.0 2 02:15:52 00:01:06
15730924_3 2 02:30:20 00:20:00 00:01:08
15730924_3.+ 2 00:00.010 00:01:08
15730924_3.+ 2 00:00:00 00:01:09
15730924_3.0 2 02:30:20 00:01:05
15730924_4 2 02:30:26 00:20:00 00:01:08
15730924_4.+ 2 00:00.006 00:01:08
15730924_4.+ 2 00:00:00 00:01:09
15730924_4.0 2 02:30:25 00:01:05
________________________________________________________
This is also reported by seff, with several errors to boot:
________________________________________________________
Use of uninitialized value $lmem in numeric lt (<) at /usr/bin/seff line
130, <DATA> line 624.
Use of uninitialized value $lmem in numeric lt (<) at /usr/bin/seff line
130, <DATA> line 624.
Use of uninitialized value $lmem in numeric lt (<) at /usr/bin/seff line
130, <DATA> line 624.
Job ID: 15730924
Array Job ID: 15730924_5
Cluster: monsoon
User/Group: cbn35/clusterstu
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 03:19:15
CPU Efficiency: 8790.44% of 00:02:16 core-walltime
Job Wall-clock time: 00:01:08
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 1.95 GB (1000.00 MB/core)
________________________________________________________
As far as I can tell, I don't think a two core job with an elapsed time of
around one minute would have a cpu time of two hours. Could this be a
configuration issue, or is it a possible bug?
More info is available on request, and any help is appreciated!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181221/d0a36897/attachment.html>
More information about the slurm-users
mailing list