[slurm-users] How to view GPU indices of the completed jobs?

Kota Tsuyuzaki kota.tsuyuzaki.pc at hco.ntt.co.jp
Wed Jun 10 06:56:28 UTC 2020


> -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.

It DOES NOT seem to include any GPU index, I meant. Sorry.

Best.

--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsuyuzaki.pc at hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------


> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Kota Tsuyuzaki
> Sent: Wednesday, June 10, 2020 11:37 AM
> To: 'Slurm User Community List' <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
> 
> > Using sacct you can find those information, try the below options and see if that works.
> >
> > sacct -j <job id>  --format=jobid,ReqTRES%50,ReqGres
> 
> Thanks, I tried that command but it looks to show the requested number of GPUs instead of the GPU index. I tried ` sacct
> -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.
> 
> Do I have to turn on some configurations to track the detailed GPU information? Am I missing something?
> 
> Best regards,
> 
> --------------------------------------------
> 露崎 浩太 (Kota Tsuyuzaki)
> kota.tsuyuzaki.pc at hco.ntt.co.jp
> NTTソフトウェアイノベーションセンタ
> 分散処理基盤技術プロジェクト
> 0422-59-2837
> ---------------------------------------------
> 
> 
> > -----Original Message-----
> > From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of
> > sathish
> > Sent: Monday, June 8, 2020 11:07 PM
> > To: Slurm User Community List <slurm-users at lists.schedmd.com>
> > Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
> >
> > Using sacct you can find those information, try the below options and see if that works.
> >
> > sacct -j <job id>  --format=jobid,ReqTRES%50,ReqGres
> >
> >
> > On Thu, Jun 4, 2020 at 1:30 PM Kota Tsuyuzaki
> > <kota.tsuyuzaki.pc at hco.ntt.co.jp <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp> > wrote:
> >
> >
> > 	Hello Guys,
> >
> > 	We are running GPU clusters with Slurm and SlurmDBD (version 19.05
> > series) and some of GPUs seemed to get troubles for attached
> > 	jobs. To investigate if the troubles happened on the same GPUs, I'd like to get GPU indices of the completed jobs.
> >
> > 	In my understanding `scontrol show job` can show the indices (as IDX
> > in gres info) but cannot be used for completed job. And also
> > 	`sacct -j` is available for complete jobs but won't print the indices.
> >
> > 	Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs?
> >
> > 	Best regards,
> >
> > 	--------------------------------------------
> > 	露崎 浩太 (Kota Tsuyuzaki)
> > 	kota.tsuyuzaki.pc at hco.ntt.co.jp <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp>
> > 	NTTソフトウェアイノベーションセンタ
> > 	分散処理基盤技術プロジェクト
> > 	0422-59-2837
> > 	---------------------------------------------
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Regards.....
> > Sathish
> 






More information about the slurm-users mailing list