[slurm-users] sreport User Utilisation Over 300%
Benson Muite
benson_muite at emailplus.org
Fri Jul 29 11:58:41 UTC 2022
On 7/29/22 11:59, mshubham wrote:
> Dear All,
> I am facing an issue in SLURM(20.11.8), in which sreport cluster
> utilization is 100%, and when I run sreport cluster
> userutilizationbyaccount, Some user utilisation is greater than 100%,
> three users including root showing utilisation over 250%, making overall
> utilisation 500% (though user has not submitted any job in past one week)
> It was showing some runaway jobs, but we cleared it, then again, it was
> showing same runaway jobs, and we cleared it again. (both
> manually/through command)
Is oversubscription enabled?
https://slurm.schedmd.com/sreport.html#SECTION_REPORT-TYPES
Do you get similar results with sacct?
> Before that, we had encountered an issue in the past in which, in our
> cluster with primary and backup slurm controller, we kept a common
> mount point for the "StateSaveLocation" /var/share/slurm/ctld. Then we
> observed a strange behaviour that " If the mount point is present and
> the service is restarted on the primary controller then it replaces all
> the statesavelocation files."
>
> This resulted in cancellation of all the jobs (running, pending state),
> reservations and assigns the JobID from 1 for newly submitted jobs. If
> the SateSaveLocation is kept on local file system instead of shared
> mount point then everything works fine even after restarting the
> slurmctld service.
>
> After that issue, utilisation is higher than expected, though it has not
> impacted any real job utilisation.
>
> Also, we have removed those user's account in SLURM, yet it is still
> showing their utilisation
>
The database should keep previous utilization records.
> Please help in resolving this issue.
>
> Thanks and Regards,
> Shubham Mehta
> HPC Technology
> CDAC Pune
More information about the slurm-users
mailing list