[slurm-users] How to view GPU indices of the completed jobs?
Kota Tsuyuzaki
kota.tsuyuzaki.pc at hco.ntt.co.jp
Wed Jun 10 06:56:28 UTC 2020
> -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.
It DOES NOT seem to include any GPU index, I meant. Sorry.
Best.
--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsuyuzaki.pc at hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------
> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Kota Tsuyuzaki
> Sent: Wednesday, June 10, 2020 11:37 AM
> To: 'Slurm User Community List' <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
>
> > Using sacct you can find those information, try the below options and see if that works.
> >
> > sacct -j <job id> --format=jobid,ReqTRES%50,ReqGres
>
> Thanks, I tried that command but it looks to show the requested number of GPUs instead of the GPU index. I tried ` sacct
> -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.
>
> Do I have to turn on some configurations to track the detailed GPU information? Am I missing something?
>
> Best regards,
>
> --------------------------------------------
> 露崎 浩太 (Kota Tsuyuzaki)
> kota.tsuyuzaki.pc at hco.ntt.co.jp
> NTTソフトウェアイノベーションセンタ
> 分散処理基盤技術プロジェクト
> 0422-59-2837
> ---------------------------------------------
>
>
> > -----Original Message-----
> > From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of
> > sathish
> > Sent: Monday, June 8, 2020 11:07 PM
> > To: Slurm User Community List <slurm-users at lists.schedmd.com>
> > Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
> >
> > Using sacct you can find those information, try the below options and see if that works.
> >
> > sacct -j <job id> --format=jobid,ReqTRES%50,ReqGres
> >
> >
> > On Thu, Jun 4, 2020 at 1:30 PM Kota Tsuyuzaki
> > <kota.tsuyuzaki.pc at hco.ntt.co.jp <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp> > wrote:
> >
> >
> > Hello Guys,
> >
> > We are running GPU clusters with Slurm and SlurmDBD (version 19.05
> > series) and some of GPUs seemed to get troubles for attached
> > jobs. To investigate if the troubles happened on the same GPUs, I'd like to get GPU indices of the completed jobs.
> >
> > In my understanding `scontrol show job` can show the indices (as IDX
> > in gres info) but cannot be used for completed job. And also
> > `sacct -j` is available for complete jobs but won't print the indices.
> >
> > Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs?
> >
> > Best regards,
> >
> > --------------------------------------------
> > 露崎 浩太 (Kota Tsuyuzaki)
> > kota.tsuyuzaki.pc at hco.ntt.co.jp <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp>
> > NTTソフトウェアイノベーションセンタ
> > 分散処理基盤技術プロジェクト
> > 0422-59-2837
> > ---------------------------------------------
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Regards.....
> > Sathish
>
More information about the slurm-users
mailing list