[slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Dec 14 13:24:19 UTC 2021


Hi Loris,

It would be great if Slurm could read the GPU load using the Nvidia 
monitoring tools, and then make the GPUload available through "scontrol 
show node xxx".  But I don't know if anyone has asked for (and paid) 
SchedMD to implement this?

Best regards,
Ole

On 12/14/21 14:16, Loris Bennett wrote:
> Hi Ole,
> 
> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
> 
>> The latest pestat version now adds a red color highlight if the GRES GPU is the
>> (null) value.
>>
>> We use this to highlight jobs on GPU nodes which didn't request any GPU
>> resources, thereby possibly wasting resources.
>>
>> Could you test if this is useful and give me a feedback?
> 
> In job_submit.lua we check whether a job sent to the GPU partition has
> actually requested a GPU as a TRES and, if not, reject it.  So that kind
> of wastage doesn't occur.
> 
> However, we do sometimes push non-GPU jobs onto GPU-nodes within a
> scavenger partition, so it would be handy if pestat highlighted these.
> At the moment, though, there are no such jobs, so I can't test.
> 
> It would however be good to be able to display the utilisation of the
> GPUs via the command-line.  Some people request GPUs, but the jobs don't
> manage to use them very much.  At the opposite end of the usage
> spectrum, today, via our Zabbix monitoring, I spotted some jobs with an
> unusually high GPU-efficiencies which turned out to be doing
> cryptomining :-/




More information about the slurm-users mailing list