[slurm-users] sreport User Utilisation Over 300%

Benson Muite benson_muite at emailplus.org
Fri Jul 29 11:58:41 UTC 2022


On 7/29/22 11:59, mshubham wrote:
> Dear All,
> I am facing an issue in SLURM(20.11.8), in which sreport cluster 
> utilization is 100%, and when I run sreport cluster 
> userutilizationbyaccount, Some user utilisation is greater than 100%, 
> three users including root showing utilisation over 250%, making overall 
> utilisation 500% (though user has not submitted any job in past one week)
> It was showing some runaway jobs, but we cleared it, then again, it was 
> showing same runaway jobs, and we cleared it again. (both 
> manually/through command)
Is oversubscription enabled?
https://slurm.schedmd.com/sreport.html#SECTION_REPORT-TYPES
Do you get similar results with sacct?

> Before that, we had encountered an issue in the past in which,  in our 
>   cluster with primary and backup slurm controller, we kept a common 
> mount point for the "StateSaveLocation" /var/share/slurm/ctld. Then we 
> observed a strange behaviour  that " If the mount point is present and 
> the service is restarted on the primary controller then it replaces all 
> the statesavelocation files."
> 
> This resulted in cancellation of all the jobs (running, pending state), 
> reservations and assigns the JobID from 1 for newly submitted jobs. If 
> the SateSaveLocation is kept on local file system instead of shared 
> mount point then everything works fine even after restarting the 
> slurmctld service.
> 
> After that issue, utilisation is higher than expected, though it has not 
> impacted any real job utilisation.
> 
> Also, we have removed those user's account in SLURM, yet it is still 
> showing their utilisation
>
The database should keep previous utilization records.

> Please help in resolving this issue.
> 
> Thanks and Regards,
> Shubham Mehta
> HPC Technology
> CDAC Pune




More information about the slurm-users mailing list