I've found that when using sacct to track resource usage over specific time periods, it's helpful to include the --truncate option. Without it, jobs that started before the specified start time will have their entire runtime counted, including time outside
the specified range. The --truncate option ensures that only the time within the defined period is included. Maybe this can explain some of the discrepancy you experience.
Hi all,
I was wondering if someone can help explaining this discrepancy.
I have different values for project gpu consumption using sreport vs sacct (+ some calculations)
This is an example that shows this:
sreport -t hours -T gres/gpu cluster AccountUtilizationByuser start=2025-04-01 end=2025-04-05 | grep project1234
gives 178
while
sacct -n -X --allusers --accounts=project1234 --start=2025-04-01 --end=2025-04-05 -o elapsedraw,AllocTRES%80,user,partition
gives
213480 billing=128,cpu=128,gres/gpu=8,mem=1000G,node=2 gpuplus
249507 billing=128,cpu=128,gres/gpu=8,mem=1000G,node=2 gpuplus
13908 billing=64,cpu=64,gres/gpu=4,mem=500G,node=1 gpuplus
9552 billing=64,cpu=64,gres/gpu=4,mem=500G,node=1 gpuplus
4 billing=16,cpu=16,gres/gpu=1,mem=200G,node=1 gpu
11 billing=16,cpu=16,gres/gpu=1,mem=200G,node=1 gpu
...
I will not bore you with the full output and its calculation, but the first job alone consumed 213480 seconds/60/60 * 8 gpus that's 474.4 gpu hours which is way more than the 178 hrs reported by sreport
Any clue why these are inconsistent? or how sreport calculated the 178 value?