[slurm-users] Releasing stale allocated TRES
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Thu Nov 23 11:16:25 UTC 2023
On 11/23/23 11:50, Markus Kötter wrote:
> On 23.11.23 10:56, Schneider, Gerald wrote:
>> I have a recurring problem with allocated TRES, which are not
>> released after all jobs on that node are finished. The TRES are still
>> marked as allocated and no new jobs can't be scheduled on that node
>> using those TRES.
>
> Remove the node from slurm.conf and restart slurmctld, re-add, restart.
> Remove from Partition definitions as well.
Just my 2 cents: Do NOT remove a node from slurm.conf just as described!
When adding or removing nodes, both slurmctld as well as all slurmd's must
be restarted! See the SchedMD presentation
https://slurm.schedmd.com/SLUG23/Field-Notes-7.pdf slides 51-56 for the
recommended procedure.
/Ole
More information about the slurm-users
mailing list