[slurm-users] How to view GPU indices of the completed jobs?
    Michael Di Domenico 
    mdidomenico4 at gmail.com
       
    Wed Jun 10 17:34:22 UTC 2020
    
    
  
I don't know the answer, but have you checked the SQL tables in the
database to see if the data you want is even being kept?  its possible
slurm is just throwing that value away.  (i agree it would be nice if
it was retrievable)
On Wed, Jun 10, 2020 at 2:59 AM Kota Tsuyuzaki
<kota.tsuyuzaki.pc at hco.ntt.co.jp> wrote:
>
> > -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.
>
> It DOES NOT seem to include any GPU index, I meant. Sorry.
>
> Best.
>
> --------------------------------------------
> 露崎 浩太 (Kota Tsuyuzaki)
> kota.tsuyuzaki.pc at hco.ntt.co.jp
> NTTソフトウェアイノベーションセンタ
> 分散処理基盤技術プロジェクト
> 0422-59-2837
> ---------------------------------------------
>
>
> > -----Original Message-----
> > From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Kota Tsuyuzaki
> > Sent: Wednesday, June 10, 2020 11:37 AM
> > To: 'Slurm User Community List' <slurm-users at lists.schedmd.com>
> > Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
> >
> > > Using sacct you can find those information, try the below options and see if that works.
> > >
> > > sacct -j <job id>  --format=jobid,ReqTRES%50,ReqGres
> >
> > Thanks, I tried that command but it looks to show the requested number of GPUs instead of the GPU index. I tried ` sacct
> > -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.
> >
> > Do I have to turn on some configurations to track the detailed GPU information? Am I missing something?
> >
> > Best regards,
> >
> > --------------------------------------------
> > 露崎 浩太 (Kota Tsuyuzaki)
> > kota.tsuyuzaki.pc at hco.ntt.co.jp
> > NTTソフトウェアイノベーションセンタ
> > 分散処理基盤技術プロジェクト
> > 0422-59-2837
> > ---------------------------------------------
> >
> >
> > > -----Original Message-----
> > > From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of
> > > sathish
> > > Sent: Monday, June 8, 2020 11:07 PM
> > > To: Slurm User Community List <slurm-users at lists.schedmd.com>
> > > Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
> > >
> > > Using sacct you can find those information, try the below options and see if that works.
> > >
> > > sacct -j <job id>  --format=jobid,ReqTRES%50,ReqGres
> > >
> > >
> > > On Thu, Jun 4, 2020 at 1:30 PM Kota Tsuyuzaki
> > > <kota.tsuyuzaki.pc at hco.ntt.co.jp <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp> > wrote:
> > >
> > >
> > >     Hello Guys,
> > >
> > >     We are running GPU clusters with Slurm and SlurmDBD (version 19.05
> > > series) and some of GPUs seemed to get troubles for attached
> > >     jobs. To investigate if the troubles happened on the same GPUs, I'd like to get GPU indices of the completed jobs.
> > >
> > >     In my understanding `scontrol show job` can show the indices (as IDX
> > > in gres info) but cannot be used for completed job. And also
> > >     `sacct -j` is available for complete jobs but won't print the indices.
> > >
> > >     Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs?
> > >
> > >     Best regards,
> > >
> > >     --------------------------------------------
> > >     露崎 浩太 (Kota Tsuyuzaki)
> > >     kota.tsuyuzaki.pc at hco.ntt.co.jp <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp>
> > >     NTTソフトウェアイノベーションセンタ
> > >     分散処理基盤技術プロジェクト
> > >     0422-59-2837
> > >     ---------------------------------------------
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards.....
> > > Sathish
> >
>
>
>
>
    
    
More information about the slurm-users
mailing list