I am running a GPU cluster where nodes are mostly off to save on electricity. I have run into the problem that if I set 'MinTRES=gres/gpu=1' in the QoS for user-account associations, waking up nodes on-demand stops working for these users. Jobs are allocated on all running nodes but if a user starts a job that would require a node to be woken up, it remains pending with 'QOSMinGRES'.
In the verbose logs, such a job will show this:
debug3: TRES Weight: cpu = 1.000000 * 0.000000 = 0.000000 debug3: TRES Weight: mem = 6000.000000 * 0.000000 = 0.000000 debug3: TRES Weight: energy = 0.000000 * 0.000000 = 0.000000 debug3: TRES Weight: node = 1.000000 * 0.000000 = 0.000000 debug3: TRES Weight: fs/disk = 0.000000 * 0.000000 = 0.000000 debug3: TRES Weight: vmem = 0.000000 * 0.000000 = 0.000000 debug3: TRES Weight: pages = 0.000000 * 0.000000 = 0.000000 debug3: TRES Weight: gres/gpu = 0.000000 * 1.000000 = 0.000000 debug3: TRES Weight: gres/gpu:nvidia_geforce_gtx_1080ti = 0.000000 * 0.000000 = 0.000000 debug3: TRES Weight: gres/gpumem = 0.000000 * 0.000000 = 0.000000 debug3: TRES Weight: gres/gpuutil = 0.000000 * 0.000000 = 0.000000 debug3: TRES Weighted: SUM(TRES) = 0.000000 debug2: JobId=4256 is being held, QOS local818-test min tres(gres/gpu) per job request 0 exceeds min tres limit 1
The job was started with '-G 1' or '--gres=gpu:1' but notably the TRES value for 'gres/gpu' becomes 0. For jobs that fit on the running nodes the TRES weight remains correct:
debug3: TRES Weight: gres/gpu = 1.000000 * 1.000000 = 1.000000
The quick fix was to unset MinTRES for all QoS but I still would force user in some accounts to use GPUs because that is what is billed for them.
Is this by design? I was not successful to find any hint that this is expected behavior.
Thanks,
Stefan