[slurm-users] Calculate the GPU usages
Jeherul Islam
jeherul at gmail.com
Wed Sep 1 15:22:48 UTC 2021
Hi Loris,
No output is truncated. Here is the snapshot of the output.
#sacct -X --account=chemistry --user=j.mira
--format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus
--starttime=2021-05-01 --endtime=2021-08-31 --noheader
23269 j.mira 1209627 TIMEOUT gpu:1 1
25853 j.mira 1200060 CANCELLED+ gpu:1 1
27335 j.mira 2 COMPLETED gpu:1 1
27336 j.mira 0 COMPLETED gpu:1 1
27339 j.mira 90 COMPLETED gpu:1 1
27564 j.mira 0 CANCELLED+ gpu:1 1
27565 j.mira 0 CANCELLED+ gpu:1 1
30865 j.mira 0 CANCELLED+ gpu:1 1
31575 j.mira 929809 COMPLETED gpu:1 1
31576 j.mira 918413 COMPLETED gpu:1 1
31573 j.mira 699059 COMPLETED gpu:1 1
36060 j.mira 1207085 CANCELLED+ gpu:1 1
40654 j.mira 682311 RUNNING gpu:1 1
[root at gpu-login ~]# sacct -X --account=chemistry --user=j.mira
--format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus
--starttime=2021-05-01 --endtime=2021-08-31 --noheader | awk '{sum += $3}
END {print sum}'
*6846556*
*It still showing a similar results.*
On Wed, Sep 1, 2021 at 6:57 PM Loris Bennett <loris.bennett at fu-berlin.de>
wrote:
> Dear Jeherul,
>
> Jeherul Islam <jeherul at gmail.com> writes:
>
> > Dear Loris,
> >
> > When we grep it by the user name "j.mira" it will strike out the
> multiple counts. Again sacct is showing fewer gpu minutes than sreport.
>
> Yes, you are right, although instead of
>
> sacct --account=chemistry
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-31 | grep j.mira
>
> it would be more elegant just to write
>
> sacct --account=chemistry --user=j.mira
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-31 --noheader
>
> However, your problem might be caused by the fact that the default width
> of the 'AllocGRES' field is to small for the values. This will cause
> the values to be truncated, so your 'grep gpu' might miss some entries.
> You might need something like
>
> --format=jobid,user,ElapsedRaw,state,AllocGRES%60,ncpus
>
> Cheers,
>
> Loris
>
>
> > On Wed, 1 Sep, 2021, 6:03 PM Loris Bennett, <loris.bennett at fu-berlin.de>
> wrote:
> >
> > Dear Jeherul,
> >
> > Jeherul Islam <jeherul at gmail.com> writes:
> >
> > > Dear Loris,
> > >
> > > Thanks for your reply. Here is the output for the same period but the
> result is not matching.
> > >
> > > #sacct --account=chemistry
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-31 | grep j.mira| grep gpu| awk '{sum += $3} END {print
> sum}'
> >
> > I think you need the option '-X' for 'sacct'. This will give you one
> > line per job rather than including the steps. Without '-X' you are
> > counting the usage multiple times for each job.
> >
> > Cheers,
> >
> > Loris
> >
> > > 6835053 (6835053/60 = 113917 )
> > >
> > > # sreport cluster AccountUtilizationByUser cluster=**** user=j.mira
> start=2021-05-01 end=2021-08-31 --tres="gres/gpu"
> > >
> --------------------------------------------------------------------------------
> > > Cluster/Account/User Utilization 2021-05-01T00:00:00 -
> 2021-08-30T23:59:59 (10540800 secs)
> > > Usage reported in TRES Minutes
> > >
> --------------------------------------------------------------------------------
> > > Cluster Account Login Proper Name TRES Name
> Used
> > > --------- --------------- --------- --------------- --------------
> --------
> > > ******** chemistry j.mira j.mira gres/gpu
> 149434
> > >
> > > On Wed, Sep 1, 2021 at 5:27 PM Loris Bennett <
> loris.bennett at fu-berlin.de> wrote:
> > >
> > > Dear Jeherul,
> > >
> > > Jeherul Islam <jeherul at gmail.com> writes:
> > >
> > > > Dear All,
> > > >
> > > > Please share the correct way of calculating the GPU usages.
> > > > I am confused with sreport and sacct cmd. I am getting a different
> result.
> > > >
> > > > # sreport cluster AccountUtilizationByUser cluster=****
> user=j.mira start=2021-05-01 end=2021-08-31 --tres="gres/gpu"
> > >
> > > Here you have:
> > >
> > > end=2021-08-31
> > >
> > > >
> --------------------------------------------------------------------------------
> > > > Cluster/Account/User Utilization 2021-05-01T00:00:00 -
> 2021-08-30T23:59:59 (10540800 secs)
> > > > Usage reported in TRES Minutes
> > > >
> --------------------------------------------------------------------------------
> > > > Cluster Account Login Proper Name TRES
> Name Used
> > > > --------- --------------- --------- --------------- --------------
> --------
> > > > **** chemistry j.mira j.mira gres/gpu
> 149434
> > > >
> > > > # sacct --account=chemistry
> --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01
> --endtime=2021-08-01 | grep j.mira| grep gpu| awk '{sum += $3} END {print
> sum}'
> > >
> > > whereas here you have
> > >
> > > --endtime=2021-08-01
> > >
> > > > 4957060
> > > >
> > > > Please share the correct way.
> > > >
> > > > With Thanks and regards
> > >
> > > so, without having checked your sacct/awk logic I would not expect
> the results to be the same.
> > >
> > > Cheers,
> > >
> > > Loris
> > >
> > > --
> > > Dr. Loris Bennett (Hr./Mr.)
> > > ZEDAT, Freie Universität Berlin Email
> loris.bennett at fu-berlin.de
> > --
> > Dr. Loris Bennett (Hr./Mr.)
> > ZEDAT, Freie Universität Berlin Email
> loris.bennett at fu-berlin.de
> >
> --
> Dr. Loris Bennett (Hr./Mr.)
> ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
>
>
--
Jeherul Islam
Technical Officer Grade I
Data Centre and High Performance Computing
Computer Centre
Indian Institute of Technology Guwahati
Guwahati-39
India
Office No :+91-361-258-3353
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210901/d17641d4/attachment.htm>
More information about the slurm-users
mailing list