[slurm-users] Calculate the GPU usages

Jeherul Islam jeherul at gmail.com
Wed Sep 1 15:22:48 UTC 2021


Hi Loris,

No output is truncated. Here is the snapshot of the output.

#sacct -X --account=chemistry --user=j.mira
 --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus
--starttime=2021-05-01 --endtime=2021-08-31 --noheader
23269           j.mira    1209627    TIMEOUT        gpu:1          1
25853           j.mira    1200060 CANCELLED+        gpu:1          1
27335           j.mira          2  COMPLETED        gpu:1          1
27336           j.mira          0  COMPLETED        gpu:1          1
27339           j.mira         90  COMPLETED        gpu:1          1
27564           j.mira          0 CANCELLED+        gpu:1          1
27565           j.mira          0 CANCELLED+        gpu:1          1
30865           j.mira          0 CANCELLED+        gpu:1          1
31575           j.mira     929809  COMPLETED        gpu:1          1
31576           j.mira     918413  COMPLETED        gpu:1          1
31573           j.mira     699059  COMPLETED        gpu:1          1
36060           j.mira    1207085 CANCELLED+        gpu:1          1
40654           j.mira     682311    RUNNING        gpu:1          1
[root at gpu-login ~]#  sacct -X --account=chemistry --user=j.mira
 --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus
--starttime=2021-05-01 --endtime=2021-08-31 --noheader | awk '{sum += $3}
END {print sum}'
*6846556*

*It still showing a similar results.*


On Wed, Sep 1, 2021 at 6:57 PM Loris Bennett <loris.bennett at fu-berlin.de>
wrote:

> Dear Jeherul,
>
> Jeherul Islam <jeherul at gmail.com> writes:
>
> > Dear Loris,
> >
> > When we grep it by the user name "j.mira" it will strike out the
> multiple counts. Again sacct is showing fewer gpu minutes than sreport.
>
> Yes, you are right, although instead of
>
>   sacct --account=chemistry
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-31  | grep j.mira
>
> it would be more elegant just to write
>
>   sacct --account=chemistry --user=j.mira
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-31 --noheader
>
> However, your problem might be caused by the fact that the default width
> of the 'AllocGRES' field is to small for the values.  This will cause
> the values to be truncated, so your 'grep gpu' might miss some entries.
> You might need something like
>
>    --format=jobid,user,ElapsedRaw,state,AllocGRES%60,ncpus
>
> Cheers,
>
> Loris
>
>
> > On Wed, 1 Sep, 2021, 6:03 PM Loris Bennett, <loris.bennett at fu-berlin.de>
> wrote:
> >
> >  Dear Jeherul,
> >
> >  Jeherul Islam <jeherul at gmail.com> writes:
> >
> >  > Dear Loris,
> >  >
> >  > Thanks for your reply. Here is the output for the same period but the
> result is not matching.
> >  >
> >  > #sacct --account=chemistry
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-31  | grep j.mira| grep gpu| awk '{sum += $3} END {print
> sum}'
> >
> >  I think you need the option '-X' for 'sacct'.  This will give you one
> >  line per job rather than including the steps.  Without '-X' you are
> >  counting the usage multiple times for each job.
> >
> >  Cheers,
> >
> >  Loris
> >
> >  > 6835053          (6835053/60 = 113917 )
> >  >
> >  > # sreport cluster AccountUtilizationByUser cluster=**** user=j.mira
> start=2021-05-01 end=2021-08-31 --tres="gres/gpu"
> >  >
> --------------------------------------------------------------------------------
> >  > Cluster/Account/User Utilization 2021-05-01T00:00:00 -
> 2021-08-30T23:59:59 (10540800 secs)
> >  > Usage reported in TRES Minutes
> >  >
> --------------------------------------------------------------------------------
> >  >   Cluster         Account     Login     Proper Name      TRES Name
>  Used
> >  > --------- --------------- --------- --------------- --------------
> --------
> >  > ********       chemistry    j.mira          j.mira       gres/gpu
>  149434
> >  >
> >  > On Wed, Sep 1, 2021 at 5:27 PM Loris Bennett <
> loris.bennett at fu-berlin.de> wrote:
> >  >
> >  >  Dear Jeherul,
> >  >
> >  >  Jeherul Islam <jeherul at gmail.com> writes:
> >  >
> >  >  > Dear All,
> >  >  >
> >  >  > Please share the correct way of calculating the GPU usages.
> >  >  > I am confused with sreport and sacct cmd. I am getting a different
> result.
> >  >  >
> >  >  > # sreport cluster AccountUtilizationByUser cluster=****
> user=j.mira start=2021-05-01 end=2021-08-31 --tres="gres/gpu"
> >  >
> >  >  Here you have:
> >  >
> >  >    end=2021-08-31
> >  >
> >  >  >
> --------------------------------------------------------------------------------
> >  >  > Cluster/Account/User Utilization 2021-05-01T00:00:00 -
> 2021-08-30T23:59:59 (10540800 secs)
> >  >  > Usage reported in TRES Minutes
> >  >  >
> --------------------------------------------------------------------------------
> >  >  >   Cluster         Account     Login     Proper Name      TRES
> Name     Used
> >  >  > --------- --------------- --------- --------------- --------------
> --------
> >  >  > ****       chemistry    j.mira          j.mira       gres/gpu
>  149434
> >  >  >
> >  >  > # sacct --account=chemistry
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-01  | grep j.mira| grep gpu| awk '{sum += $3} END {print
> sum}'
> >  >
> >  >  whereas here you have
> >  >
> >  >    --endtime=2021-08-01
> >  >
> >  >  > 4957060
> >  >  >
> >  >  > Please share the correct way.
> >  >  >
> >  >  > With Thanks and regards
> >  >
> >  >  so, without having checked your sacct/awk logic I would not expect
> the results to be the same.
> >  >
> >  >  Cheers,
> >  >
> >  >  Loris
> >  >
> >  >  --
> >  >  Dr. Loris Bennett (Hr./Mr.)
> >  >  ZEDAT, Freie Universität Berlin         Email
> loris.bennett at fu-berlin.de
> >  --
> >  Dr. Loris Bennett (Hr./Mr.)
> >  ZEDAT, Freie Universität Berlin         Email
> loris.bennett at fu-berlin.de
> >
> --
> Dr. Loris Bennett (Hr./Mr.)
> ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de
>
>

-- 
Jeherul Islam
Technical Officer Grade I
Data Centre and High Performance Computing
Computer Centre
Indian Institute of Technology Guwahati
Guwahati-39
India
Office No :+91-361-258-3353
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210901/d17641d4/attachment.htm>


More information about the slurm-users mailing list