Hi all, I have observed a significant discrepancy in CPU usage time calculations between sreport and sacct, and I would like to understand the underlying cause. Let me share the specific case I encountered when calculating CPU usage time for user zt23132881r from November 1, 2024, to November 30, 2024.
1. sreport Results (995,171 minutes): --------------------------------------------------------------------------------
*[root@master ~]# sreport Cluster UserUtilizationByAccount user=zt23132881r start=2024-11-01 end=2024-11-30--------------------------------------------------------------------------------Cluster/User/Account Utilization 2024-11-01T00:00:00 - 2024-11-29T23:59:59 (2505600 secs)Usage reported in CPU Minutes-------------------------------------------------------------------------------- Cluster Login Proper Name Account Used Energy--------- --------- --------------- --------------- -------- ---------djhpc-po+ zt231328+ zt23132881r zt+ zt23132881r_ba+ 995171 6294875*
2. sacct Results: # Without truncate (1,019,927 minutes / 61,195,668 seconds)
*[root@master ~]# sacct -u zt23132881r -S 2024-11-01 -E 2024-11-30 -o "jobid,partition,account,user,alloccpus,cputimeraw,state" -X |awk 'BEGIN{total=0}{total+=$6}END{print total}'61195668*
# With truncate (967,165 minutes / 58,029,908 seconds)
*[root@master ~]# sacct -u zt23132881r -S 2024-11-01 -E 2024-11-30 -o "jobid,partition,account,user,alloccpus,cputimeraw,state" -X --truncate |awk 'BEGIN{total=0}{total+=$6}END{print total}'58029908*
# No -X
*[root@master ~]# sacct -u zt23132881r -S 2024-11-01 -E 2024-11-30 -o "jobid,partition,account,user,alloccpus,cputimeraw,state" |awk 'BEGIN{total=0}{total+=$6}END{print total}'61195668*
The results show three different values:
- *sreport: 995,171 minutes* - *sacct (without truncate): 1,019,927 minutes* - *sacct (with truncate): 967,165 minutes*
I would appreciate if someone could explain:
- Which of these results is more accurate? - How does sreport calculate CPU usage time? - Why does the --truncate option in sacct lead to different results?
Thank you for your assistance in clarifying these discrepancies. Best regards
[root@master ~]# sinfo --version slurm 23.11.6 [root@master ~]# sacct -u zt23132881r -S 2024-11-01 -E 2024-11-30 -o "jobid,partition,account,user,alloccpus,cputimeraw,state" -X JobID Partition Account User AllocCPUS CPUTimeRAW State ------------ ---------- ---------- --------- ---------- ---------- ---------- 4418 cpu zt2313288+ zt231328+ 22 352 FAILED 4419 cpu zt2313288+ zt231328+ 22 1210 CANCELLED+ 4426 cpu zt2313288+ zt231328+ 22 14366 CANCELLED+ 4427 cpu zt2313288+ zt231328+ 22 1116544 CANCELLED+ 4442 cpu zt2313288+ zt231328+ 22 6798 CANCELLED+ 4443 cpu zt2313288+ zt231328+ 22 2508 CANCELLED+ 4444 cpu zt2313288+ zt231328+ 34 9214 CANCELLED+ 4445 cpu zt2313288+ zt231328+ 34 250240 CANCELLED+ 4463 cpu zt2313288+ zt231328+ 0 0 CANCELLED+ 4464 cpu zt2313288+ zt231328+ 24 1272 FAILED 4465 cpu zt2313288+ zt231328+ 24 562008 CANCELLED+ 4506 cpu zt2313288+ zt231328+ 56 54992 CANCELLED+ 4508 cpu zt2313288+ zt231328+ 56 1064 FAILED 4509 cpu zt2313288+ zt231328+ 56 27608 CANCELLED+ 4510 cpu zt2313288+ zt231328+ 56 87472 COMPLETED 4513 cpu zt2313288+ zt231328+ 56 25592 CANCELLED+ 4514 cpu zt2313288+ zt231328+ 56 183400 FAILED 4522 cpu zt2313288+ zt231328+ 56 2483040 CANCELLED+ 4526 cpu zt2313288+ zt231328+ 56 29736 FAILED 4527 cpu zt2313288+ zt231328+ 56 1043952 CANCELLED+ 4529 cpu zt2313288+ zt231328+ 56 3466456 COMPLETED 4643 cpu zt2313288+ zt231328+ 64 4736 CANCELLED+ 4644 cpu zt2313288+ zt231328+ 64 768 FAILED 4645 cpu zt2313288+ zt231328+ 64 656384 COMPLETED 4650 cpu zt2313288+ zt231328+ 64 7783360 COMPLETED 4651 cpu zt2313288+ zt231328+ 56 730408 FAILED 4652 cpu zt2313288+ zt231328+ 56 751520 CANCELLED+ 4653 cpu zt2313288+ zt231328+ 56 902496 CANCELLED+ 4654 cpu zt2313288+ zt231328+ 56 1747760 COMPLETED 5843 cpu zt2313288+ zt231328+ 70 421890 CANCELLED+ 5917 cpu zt2313288+ zt231328+ 64 2624 CANCELLED+ 5918 cpu zt2313288+ zt231328+ 70 14062860 COMPLETED 6045 cpu zt2313288+ zt231328+ 112 38528 CANCELLED+ 6046 cpu zt2313288+ zt231328+ 112 4480 CANCELLED+ 6047 cpu zt2313288+ zt231328+ 8 280 CANCELLED+ 6048 cpu zt2313288+ zt231328+ 1 65 CANCELLED+ 6049 cpu zt2313288+ zt231328+ 2 68 CANCELLED+ 6050 cpu zt2313288+ zt231328+ 2 76 CANCELLED+ 6051 cpu zt2313288+ zt231328+ 2 26 CANCELLED+ 6064 cpu zt2313288+ zt231328+ 112 30240 CANCELLED+ 6065 cpu zt2313288+ zt231328+ 70 18060 CANCELLED+ 6066 cpu zt2313288+ zt231328+ 70 21560 COMPLETED 6067 cpu zt2313288+ zt231328+ 56 1474536 CANCELLED+ 6069 cpu zt2313288+ zt231328+ 1 5070 CANCELLED+ 6070 cpu zt2313288+ zt231328+ 61 427 COMPLETED 6071 cpu zt2313288+ zt231328+ 61 1204018 COMPLETED 6072 cpu zt2313288+ zt231328+ 70 1120 FAILED 6074 cpu zt2313288+ zt231328+ 70 13930 CANCELLED+ 6075 cpu zt2313288+ zt231328+ 70 434980 CANCELLED+ 6076 cpu zt2313288+ zt231328+ 112 1448160 CANCELLED+ 6081 cpu zt2313288+ zt231328+ 112 301952 COMPLETED 6082 cpu zt2313288+ zt231328+ 112 201936 CANCELLED+ 6083 cpu zt2313288+ zt231328+ 112 188720 CANCELLED+ 6084 cpu zt2313288+ zt231328+ 112 1502144 COMPLETED 6085 cpu zt2313288+ zt231328+ 112 1232 CANCELLED+ 6086 cpu zt2313288+ zt231328+ 112 346416 CANCELLED+ 6087 cpu zt2313288+ zt231328+ 112 46704 CANCELLED+ 6088 cpu zt2313288+ zt231328+ 112 828576 CANCELLED+ 6089 cpu zt2313288+ zt231328+ 112 144144 CANCELLED+ 6090 cpu zt2313288+ zt231328+ 112 217504 CANCELLED+ 6091 cpu zt2313288+ zt231328+ 112 193200 CANCELLED+ 6092 cpu zt2313288+ zt231328+ 112 204288 CANCELLED+ 6093 cpu zt2313288+ zt231328+ 112 1160768 COMPLETED 6124 cpu zt2313288+ zt231328+ 61 588894 COMPLETED 6125 cpu zt2313288+ zt231328+ 112 1285760 COMPLETED 6137 cpu zt2313288+ zt231328+ 0 0 CANCELLED+ 6138 cpu zt2313288+ zt231328+ 70 95340 CANCELLED 6143 cpu zt2313288+ zt231328+ 0 0 CANCELLED 6145 cpu zt2313288+ zt231328+ 0 0 CANCELLED+ 6146 cpu zt2313288+ zt231328+ 0 0 CANCELLED+ 6153 cpu zt2313288+ zt231328+ 70 77630 CANCELLED+ 6158 cpu zt2313288+ zt231328+ 70 50400 CANCELLED+ 6159 cpu zt2313288+ zt231328+ 112 774256 COMPLETED 6177 cpu zt2313288+ zt231328+ 31 277729 COMPLETED 6220 cpu zt2313288+ zt231328+ 70 1120 FAILED 6222 cpu zt2313288+ zt231328+ 70 24430 CANCELLED+ 6224 cpu zt2313288+ zt231328+ 70 319060 CANCELLED+ 6237 cpu zt2313288+ zt231328+ 70 610750 CANCELLED+ 6270 cpu zt2313288+ zt231328+ 70 3277610 CANCELLED+ 6294 cpu zt2313288+ zt231328+ 80 503280 CANCELLED+ 6316 cpu zt2313288+ zt231328+ 80 88560 CANCELLED+ 6322 cpu zt2313288+ zt231328+ 80 5200 FAILED 6332 cpu zt2313288+ zt231328+ 80 5280 FAILED 6335 cpu zt2313288+ zt231328+ 80 5440 FAILED 6343 cpu zt2313288+ zt231328+ 80 5600 FAILED 6348 cpu zt2313288+ zt231328+ 80 5200 FAILED 6351 cpu zt2313288+ zt231328+ 80 20640 FAILED 6507 cpu zt2313288+ zt231328+ 12 108216 COMPLETED 6508 cpu zt2313288+ zt231328+ 5 55195 COMPLETED 6555 cpu zt2313288+ zt231328+ 40 1360 FAILED 6556 cpu zt2313288+ zt231328+ 40 1517960 CANCELLED+ 6563 cpu zt2313288+ zt231328+ 40 133360 CANCELLED+ 6571 cpu zt2313288+ zt231328+ 40 79480 CANCELLED+ 6576 cpu zt2313288+ zt231328+ 40 814440 CANCELLED+ 6589 cpu zt2313288+ zt231328+ 40 3997640 COMPLETED [root@master ~]#