[slurm-users] Releasing stale allocated TRES

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Nov 23 11:16:25 UTC 2023


On 11/23/23 11:50, Markus Kötter wrote:
> On 23.11.23 10:56, Schneider, Gerald wrote:
>> I have a recurring problem with allocated TRES, which are not
>> released after all jobs on that node are finished. The TRES are still
>> marked as allocated and no new jobs can't be scheduled on that node
>> using those TRES.
> 
> Remove the node from slurm.conf and restart slurmctld, re-add, restart.
> Remove from Partition definitions as well.

Just my 2 cents:  Do NOT remove a node from slurm.conf just as described!

When adding or removing nodes, both slurmctld as well as all slurmd's must 
be restarted!  See the SchedMD presentation 
https://slurm.schedmd.com/SLUG23/Field-Notes-7.pdf slides 51-56 for the 
recommended procedure.

/Ole



More information about the slurm-users mailing list