[slurm-users] Releasing stale allocated TRES

Schneider, Gerald gerald.schneider at igd-r.fraunhofer.de
Thu Nov 23 09:56:39 UTC 2023


Hi there,

I have a recurring problem with allocated TRES, which are not released after all jobs on that node are finished. The TRES are still marked as allocated and no new jobs can't be scheduled on that node using those TRES.

$ scontrol show node node2
NodeName=node2 Arch=x86_64 CoresPerSocket=64
   CPUAlloc=0 CPUTot=256 CPULoad=0.11
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:tesla:8
   NodeAddr=node2 NodeHostName=node2 Version=21.08.5
   OS=Linux 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023
   RealMemory=1025593 AllocMem=0 FreeMem=1025934 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=AMPERE
   BootTime=2023-11-23T09:01:28 SlurmdStartTime=2023-11-23T09:02:09
   LastBusyTime=2023-11-23T09:03:19
   CfgTRES=cpu=256,mem=1025593M,billing=256,gres/gpu=8,gres/gpu:tesla=8
   AllocTRES=gres/gpu=8
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Previously the allocation was gone after the server was turned off for a couple of hours (power conservation) but the issue occurred again and this time it persists even after the server was off over night.

Is there any way to release the allocation manually?

Regards,
Gerald Schneider

--
Gerald Schneider

Fraunhofer-Institut für Graphische Datenverarbeitung IGD 
Joachim-Jungius-Str. 11 | 18059 Rostock | Germany 
Tel. +49 6151 155-309 | +49 381 4024-193 | Fax +49 381 4024-199 
gerald.schneider at igd-r.fraunhofer.de | www.igd.fraunhofer.de




More information about the slurm-users mailing list