Hi,
because of my real scenario (in mi first post I explained my testing scenario), with several differents users of differents types (researchers, university students and/or teachers, etc), I have distributed my GPUs in 3 differents partitions:
* PartitionName=cuda-staff.q Nodes=gpu-[1-4] OverSubscribe=No MaxTime=INFINITE State=UP AllocNodes=node[0-22],node-login,node-login-bak AllowGroups=caos,profesor * PartitionName=cuda-int.q Nodes=gpu-[2,4] OverSubscribe=No MaxTime=30:00 State=UP AllocNodes=node[0-22] * PartitionName=cuda-ext.q Nodes=gpu-[1,3] OverSubscribe=No MaxTime=30:00 State=UP AllocNodes=node[0-22],node-login,node-login-bak
Explanation:
* In “cuda-staff.q”, only teachers could submit and they can submit from each lab node or each login node. * In “cuda-int.q” everybody can submit, but only from lab nodes. * In “cuda-ex.q” everybody can also submit, but in this case, from lab nodes and login nodes.
Since now, I have not used “QoS”... but I’m going to install a new data/user server and I want to reconfigure SLURM. If I distributed GPUs in the way that Gerhard Strangar explains (both similar RTX3080) in a partition with restricted QoS and all other GPUs in other partition, some GPUs that now are restricted for “inside lab user” will be accessible from “outside lab user”. So I think (all teachers want this way ☹ ) I must have these partitions distribution. However, if I apply a QoS limiting only one GPU in each partition. it woulb be possible that a user could user one RTX3080 from outside lab and the other RTX3080 from inside lab… and this is what I want to deny.
I will reread documentation.
All help will be appreciated, of course!!!!
Thanks.