[slurm-users] Calculate the GPU usages
Tina Friedrich
tina.friedrich at it.ox.ac.uk
Wed Sep 1 13:33:42 UTC 2021
...or maybe
sacct -p --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus
i.e. make the output parsable (non-truncated and delimiter separated) -
handy if you want/need to do further work with the data.
Tina
On 01/09/2021 14:24, Loris Bennett wrote:
> Dear Jeherul,
>
> Jeherul Islam <jeherul at gmail.com> writes:
>
>> Dear Loris,
>>
>> When we grep it by the user name "j.mira" it will strike out the multiple counts. Again sacct is showing fewer gpu minutes than sreport.
>
> Yes, you are right, although instead of
>
> sacct --account=chemistry --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01 --endtime=2021-08-31 | grep j.mira
>
> it would be more elegant just to write
>
> sacct --account=chemistry --user=j.mira --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01 --endtime=2021-08-31 --noheader
>
> However, your problem might be caused by the fact that the default width
> of the 'AllocGRES' field is to small for the values. This will cause
> the values to be truncated, so your 'grep gpu' might miss some entries.
> You might need something like
>
> --format=jobid,user,ElapsedRaw,state,AllocGRES%60,ncpus
>
> Cheers,
>
> Loris
>
>
>> On Wed, 1 Sep, 2021, 6:03 PM Loris Bennett, <loris.bennett at fu-berlin.de> wrote:
>>
>> Dear Jeherul,
>>
>> Jeherul Islam <jeherul at gmail.com> writes:
>>
>> > Dear Loris,
>> >
>> > Thanks for your reply. Here is the output for the same period but the result is not matching.
>> >
>> > #sacct --account=chemistry --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01 --endtime=2021-08-31 | grep j.mira| grep gpu| awk '{sum += $3} END {print sum}'
>>
>> I think you need the option '-X' for 'sacct'. This will give you one
>> line per job rather than including the steps. Without '-X' you are
>> counting the usage multiple times for each job.
>>
>> Cheers,
>>
>> Loris
>>
>> > 6835053 (6835053/60 = 113917 )
>> >
>> > # sreport cluster AccountUtilizationByUser cluster=**** user=j.mira start=2021-05-01 end=2021-08-31 --tres="gres/gpu"
>> > --------------------------------------------------------------------------------
>> > Cluster/Account/User Utilization 2021-05-01T00:00:00 - 2021-08-30T23:59:59 (10540800 secs)
>> > Usage reported in TRES Minutes
>> > --------------------------------------------------------------------------------
>> > Cluster Account Login Proper Name TRES Name Used
>> > --------- --------------- --------- --------------- -------------- --------
>> > ******** chemistry j.mira j.mira gres/gpu 149434
>> >
>> > On Wed, Sep 1, 2021 at 5:27 PM Loris Bennett <loris.bennett at fu-berlin.de> wrote:
>> >
>> > Dear Jeherul,
>> >
>> > Jeherul Islam <jeherul at gmail.com> writes:
>> >
>> > > Dear All,
>> > >
>> > > Please share the correct way of calculating the GPU usages.
>> > > I am confused with sreport and sacct cmd. I am getting a different result.
>> > >
>> > > # sreport cluster AccountUtilizationByUser cluster=**** user=j.mira start=2021-05-01 end=2021-08-31 --tres="gres/gpu"
>> >
>> > Here you have:
>> >
>> > end=2021-08-31
>> >
>> > > --------------------------------------------------------------------------------
>> > > Cluster/Account/User Utilization 2021-05-01T00:00:00 - 2021-08-30T23:59:59 (10540800 secs)
>> > > Usage reported in TRES Minutes
>> > > --------------------------------------------------------------------------------
>> > > Cluster Account Login Proper Name TRES Name Used
>> > > --------- --------------- --------- --------------- -------------- --------
>> > > **** chemistry j.mira j.mira gres/gpu 149434
>> > >
>> > > # sacct --account=chemistry --format=jobid,user,ElapsedRaw,state,AllocGRES,ncpus --starttime=2021-05-01 --endtime=2021-08-01 | grep j.mira| grep gpu| awk '{sum += $3} END {print sum}'
>> >
>> > whereas here you have
>> >
>> > --endtime=2021-08-01
>> >
>> > > 4957060
>> > >
>> > > Please share the correct way.
>> > >
>> > > With Thanks and regards
>> >
>> > so, without having checked your sacct/awk logic I would not expect the results to be the same.
>> >
>> > Cheers,
>> >
>> > Loris
>> >
>> > --
>> > Dr. Loris Bennett (Hr./Mr.)
>> > ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
>> --
>> Dr. Loris Bennett (Hr./Mr.)
>> ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
>>
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk
More information about the slurm-users
mailing list