MinTRES in QoS and power saving - slurm-users

1 Mar 2024


      I am running a GPU cluster where nodes are mostly off to save on 
electricity. I have run into the problem that if I set 
'MinTRES=gres/gpu=1' in the QoS for user-account associations, waking up 
nodes on-demand stops working for these users. Jobs are allocated on all 
running nodes but if a user starts a job that would require a node to be 
woken up, it remains pending with 'QOSMinGRES'.
In the verbose logs, such a job will show this:
debug3: TRES Weight: cpu = 1.000000 * 0.000000 = 0.000000
debug3: TRES Weight: mem = 6000.000000 * 0.000000 = 0.000000
debug3: TRES Weight: energy = 0.000000 * 0.000000 = 0.000000
debug3: TRES Weight: node = 1.000000 * 0.000000 = 0.000000
debug3: TRES Weight: fs/disk = 0.000000 * 0.000000 = 0.000000
debug3: TRES Weight: vmem = 0.000000 * 0.000000 = 0.000000
debug3: TRES Weight: pages = 0.000000 * 0.000000 = 0.000000
debug3: TRES Weight: gres/gpu = 0.000000 * 1.000000 = 0.000000
debug3: TRES Weight: gres/gpu:nvidia_geforce_gtx_1080ti = 0.000000 * 
0.000000 = 0.000000
debug3: TRES Weight: gres/gpumem = 0.000000 * 0.000000 = 0.000000
debug3: TRES Weight: gres/gpuutil = 0.000000 * 0.000000 = 0.000000
debug3: TRES Weighted: SUM(TRES) = 0.000000
debug2: JobId=4256 is being held, QOS local818-test min tres(gres/gpu) 
per job request 0 exceeds min tres limit 1
The job was started with '-G 1' or '--gres=gpu:1' but notably the TRES 
value for 'gres/gpu' becomes 0. For jobs that fit on the running nodes 
the TRES weight remains correct:
debug3: TRES Weight: gres/gpu = 1.000000 * 1.000000 = 1.000000
The quick fix was to unset MinTRES for all QoS but I still would force 
user in some accounts to use GPUs because that is what is billed for them.
Is this by design? I was not successful to find any hint that this is 
expected behavior.
Thanks,
Stefan