[slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Dec 14 13:24:19 UTC 2021
It would be great if Slurm could read the GPU load using the Nvidia
monitoring tools, and then make the GPUload available through "scontrol
show node xxx". But I don't know if anyone has asked for (and paid)
SchedMD to implement this?
On 12/14/21 14:16, Loris Bennett wrote:
> Hi Ole,
> Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> writes:
>> The latest pestat version now adds a red color highlight if the GRES GPU is the
>> (null) value.
>> We use this to highlight jobs on GPU nodes which didn't request any GPU
>> resources, thereby possibly wasting resources.
>> Could you test if this is useful and give me a feedback?
> In job_submit.lua we check whether a job sent to the GPU partition has
> actually requested a GPU as a TRES and, if not, reject it. So that kind
> of wastage doesn't occur.
> However, we do sometimes push non-GPU jobs onto GPU-nodes within a
> scavenger partition, so it would be handy if pestat highlighted these.
> At the moment, though, there are no such jobs, so I can't test.
> It would however be good to be able to display the utilisation of the
> GPUs via the command-line. Some people request GPUs, but the jobs don't
> manage to use them very much. At the opposite end of the usage
> spectrum, today, via our Zabbix monitoring, I spotted some jobs with an
> unusually high GPU-efficiencies which turned out to be doing
> cryptomining :-/
More information about the slurm-users