[slurm-users] How to view GPU indices of the completed jobs?
Kota Tsuyuzaki
kota.tsuyuzaki.pc at hco.ntt.co.jp
Wed Jun 10 02:36:38 UTC 2020
> Using sacct you can find those information, try the below options and see if that works.
>
> sacct -j <job id> --format=jobid,ReqTRES%50,ReqGres
Thanks, I tried that command but it looks to show the requested number of GPUs instead of the GPU index. I tried ` sacct -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.
Do I have to turn on some configurations to track the detailed GPU information? Am I missing something?
Best regards,
--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsuyuzaki.pc at hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------
> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of sathish
> Sent: Monday, June 8, 2020 11:07 PM
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
>
> Using sacct you can find those information, try the below options and see if that works.
>
> sacct -j <job id> --format=jobid,ReqTRES%50,ReqGres
>
>
> On Thu, Jun 4, 2020 at 1:30 PM Kota Tsuyuzaki <kota.tsuyuzaki.pc at hco.ntt.co.jp
> <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp> > wrote:
>
>
> Hello Guys,
>
> We are running GPU clusters with Slurm and SlurmDBD (version 19.05 series) and some of GPUs seemed to get
> troubles for attached
> jobs. To investigate if the troubles happened on the same GPUs, I'd like to get GPU indices of the completed jobs.
>
> In my understanding `scontrol show job` can show the indices (as IDX in gres info) but cannot be used for
> completed job. And also
> `sacct -j` is available for complete jobs but won't print the indices.
>
> Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs?
>
> Best regards,
>
> --------------------------------------------
> 露崎 浩太 (Kota Tsuyuzaki)
> kota.tsuyuzaki.pc at hco.ntt.co.jp <mailto:kota.tsuyuzaki.pc at hco.ntt.co.jp>
> NTTソフトウェアイノベーションセンタ
> 分散処理基盤技術プロジェクト
> 0422-59-2837
> ---------------------------------------------
>
>
>
>
>
>
>
>
>
> --
>
> Regards.....
> Sathish
More information about the slurm-users
mailing list