SlurmDBD errors - slurm-users

19 Sep 2024


      OS: CentOS 8.5
Slurm: 22.05
Recently upgraded to 22.05. Upgrade was successful, but after a while I started to see the following messages in the slurmdbd.log file:
error: We have more time than is possible (9344745+7524000+0)(16868745) > 12362400 for cluster CLUSTERNAME(3434) from 2024-09-18T13:00:00 - 2024-09-18T14:00:00 tres 1 (this may happen if oversubscription of resources is allowed without Gang)
We do have partitions with overlapping nodes, but do not have "Suspend,Gang" set as the global PreemptMode mode. It is currently set to requeue.
I have also check sacct and there are no runaway jobs listed.
Oversubscription is not enabled on any of the queues as well.
Do I need to modify my slurm config to address or is this an error condition caused by the upgrade?
Thank you,
SS