[slurm-users] sreport User Utilisation Over 300%
mshubham
mshubham at cdac.in
Mon Aug 1 06:21:45 UTC 2022
Dear Benson,
We have not set the property of "oversubscription " in slurm.conf, so the
default is no.
Yes, sacct and database are showing correct results.
When we see job details for that user in sacct, it doesnot showing any job in
last 10 days.
Only sreport is showing incorrect utilisation for some users which we have
removed from slurm account, yet they are still showing utilisation.
Below is the accountutilizationbyuser and utilisation report generated by
sreport in the cluster.
#sreport cluster accountutilizationbyuser -t per start=073022T11:00:00 end=now
--------------------------------------------------------------------------------
Cluster/Account/User Utilization 2022-07-30T11:00:00 - 2022-08-01T10:59:59
(172800 secs)
Usage reported in Percentage of Total
------------------------------------------------------------------
Cluster Account Login Used Energy
--------- --------------- --------- ---------- -------------------
cluster+ root 842.73% 100.00%
cluster+ root root 464.66% 0.00%
cluster+ hpc 293.57% 0.00%
cluster+ hpc user01 51.33% 0.00%
cluster+ hpc user02 242.24% 0.00%
cluster+ phy 73.79% 99.85%
cluster+ phy user03 0.32% 0.00%
#sreport cluster utilisation -t per start=080122 end=now
--------------------------------------------------------------------------------
Cluster Utilization 2022-08-01T00:00:00 - 2022-08-01T10:59:59
Usage reported in Percentage of Total
------------------------------------------------------------------------------------
Cluster Allocated Down PLND Dow Idle Reserved Reported
--------- ---------- -------- -------- -------- --------
----------------------------
cluster+ 100.00% 0.00% 0.00% 0.00% 0.00% 100.00%
Also, the cluster is showing1-2 runaway jobs everyday from the same time period
from which the cluster started showing this issue. We remove them on a daily
basis.
On 7/29/22 11:59, mshubham wrote:
> Dear All,
> I am facing an issue in SLURM(20.11.8), in which sreport cluster
> utilization is 100%, and when I run sreport cluster
> userutilizationbyaccount, Some user utilisation is greater than 100%,
> three users including root showing utilisation over 250%, making overall
> utilisation 500% (though user has not submitted any job in past one week)
> It was showing some runaway jobs, but we cleared it, then again, it was
> showing same runaway jobs, and we cleared it again. (both
> manually/through command)
Is oversubscription enabled?
https://slurm.schedmd.com/sreport.html#SECTION_REPORT-TYPES
<https://slurm.schedmd.com/sreport.html#SECTION_REPORT-TYPES>
Do you get similar results with sacct?
> Before that, we had encountered an issue in the past in which, in our
> cluster with primary and backup slurm controller, we kept a common
> mount point for the "StateSaveLocation" /var/share/slurm/ctld. Then we
> observed a strange behaviour that " If the mount point is present and
> the service is restarted on the primary controller then it replaces all
> the statesavelocation files."
>
> This resulted in cancellation of all the jobs (running, pending state),
> reservations and assigns the JobID from 1 for newly submitted jobs. If
> the SateSaveLocation is kept on local file system instead of shared
> mount point then everything works fine even after restarting the
> slurmctld service.
>
> After that issue, utilisation is higher than expected, though it has not
> impacted any real job utilisation.
>
> Also, we have removed those user's account in SLURM, yet it is still
> showing their utilisation
>
The database should keep previous utilization records.
> Please help in resolving this issue.
>
> Thanks and Regards,
> Shubham Mehta
> HPC Technology
> CDAC Pune
Thanks and Regards,
Shubham Mehta
HPC Technology
CDAC Pune
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220801/50c7af55/attachment.htm>
More information about the slurm-users
mailing list