[slurm-users] Information about finished jobs
Paul Raines
raines at nmr.mgh.harvard.edu
Mon Jun 14 18:12:17 UTC 2021
I have been writing my own 'jobinfo' tool for users to see info on
a job in any state that is useful and readable by them. Still
new to slurm and trying to wrap my head around the database info
and the effects of arrays and such.
A completed job output looks like this:
# jobinfo 357300
--------------------------------------------------
JobID : 356847_361 | 356847_361.batch
JobName : batch_compile_loraks.sh
User : gr879
Account : syhdiff
Partition : basic
ReqTRES : billing=1,cpu=1,mem=40G,node=1
AllocTRES : billing=1,cpu=1,mem=40G,node=1
NodeList : r440-19
Submit : 2021-06-13T22:07:07
Start : 2021-06-14T01:47:55 | 2021-06-14T01:47:55
End : 2021-06-14T05:22:00 | 2021-06-14T05:22:00
Timelimit : 2-00:00:00
Elapsed : 03:34:05 | 03:34:05
CPUTime : 03:34:05 | 03:34:05
SystemCPU : 05:57.056 | 05:57.056
UserCPU : 03:27:07 | 03:27:07
TotalCPU : 03:33:04 | 03:33:04
MaxDiskRead : | 109.25M
MaxDiskWrite : | 1.08M
MaxRSS : | 32529204K
MaxVMSize : | 61834112K
State : COMPLETED | COMPLETED
ExitCode : 0:0 | 0:0
WorkDir : /autofs/homes/002/gr879/matlab/ex_vivo/batch_code
and a typical RUNNING job looks like
# jobinfo 357304
--------------------------------------------------
JobID : 357199_21 | 357199_21.batch
JobName : batch_compile_multi_shell.sh
User : gr879
Account : syhdiff
Partition : basic
ReqTRES : billing=1,cpu=1,mem=12G,node=1
AllocTRES : billing=1,cpu=1,mem=12G,node=1
NodeList : r440-17
Submit : 2021-06-14T00:31:11
Start : 2021-06-14T01:47:55 | 2021-06-14T01:47:55
End : Unknown | Unknown
Timelimit : 1-00:00:00
Elapsed : 12:04:35 | 12:04:35
CPUTime : 12:04:35 | 12:04:35
SystemCPU : 00:00:00 | 00:00:00
UserCPU : 00:00:00 | 00:00:00
TotalCPU : 00:00:00 | 12:01:46
MaxDiskRead : | 101176763
MaxDiskWrite : | 1259187
MaxRSS : | 5455M
MaxVMSize : | 10823600K
State : RUNNING | RUNNING
ExitCode : 0:0 | 0:0
WorkDir : /autofs/homes/002/gr879/matlab/ex_vivo/batch_code
Where unfortunately I have to give zeros on certain info
I cannot get yet. My current issue is with that TotalCPU row
on running jobs. I actually get that from AveCPU from sstat and
in the case above it looks right. But in others it is just way off
# jobinfo 357305
--------------------------------------------------
JobID : 357305 | 357305.batch
JobName : sjob_185
User : mjk2
Account : circgp
Partition : basic
ReqTRES : billing=27,cpu=20,mem=370G,node=1
AllocTRES : billing=27,cpu=20,mem=370G,node=1
NodeList : r440-05
Submit : 2021-06-14T01:44:56
Start : 2021-06-14T05:02:10 | 2021-06-14T05:02:10
End : Unknown | Unknown
Timelimit : 7-00:00:00
Elapsed : 08:50:17 | 08:50:17
CPUTime : 7-08:45:40 | 7-08:45:40
SystemCPU : 00:00:00 | 00:00:00
UserCPU : 00:00:00 | 00:00:00
TotalCPU : 00:00:00 | 11:33.000
MaxDiskRead : | 79699046
MaxDiskWrite : | 17983
MaxRSS : | 81357340K
MaxVMSize : | 104992372K
State : RUNNING | RUNNING
ExitCode : 0:0 | 0:0
WorkDir : /autofs/homes/002/mjk2
In this job the user asked for 20 cores, but I can see his
job is only one one core on the actual node so this is a big waste.
But that core is constantly going 100% so I would expect AveCPU
to be close to the Elapsed time but is is way less (11 minutes
instead of nearly 9 hours)
# /usr/bin/sstat -p -a --job=357305 --format=JobID,AveCPU
JobID|AveCPU|
357305.extern|213503982334-14:25:51|
357305.batch|11:33.000|
Any idea why this is? Also, what is that crazy number for
AveCPU on 357305.extern?
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Mon, 14 Jun 2021 2:45am, Ole Holm Nielsen wrote:
> On 6/14/21 8:26 AM, Gestió Servidors wrote:
>> How can I get all information about a finished job in the same way as
>> “scontrol show jobid=” when job is pending or running?
>
> Some minutes after job completion, you can only get the information which is
> stored in the Slurm database.
>
> My script "showjob" in
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs shows all
> available information for jobs in the queue as well as in the database.
>
> /Ole
>
>
>
>
More information about the slurm-users
mailing list