Hello,
First of all, sorry if my question is about something easy, but in my environment, this is the first time I need to do this. I have one server that have 2 identical GPUs. I have two different partitions: one is accessible
from some allocation nodes and the other from other allocation nodes. My question is if it’s possible to share one GPU with one partition and the other GPU with the other partition. For example, something like this:
NodeName=gpu-node AutoDetect=off Name=gpu Type=RTX5070 File=/dev/nvidia0 Cores=0-127
NodeName=gpu-node AutoDetect=off Name=gpu Type=RTX5070 File=/dev/nvidia1 Cores=128-255
NodeName=gpu-node CPUs=256 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=260000 TmpDisk=47000 Gres= RTX5070:2
PartitionName=gpu-int.q Nodes=gpu-node OverSubscribe=No State=UP AllocNodes=submit-node-1 PriorityTier=3 GraceTime=900
PartitionName=gpu-ext.q Nodes=gpu-node OverSubscribe=No State=UP AllocNodes=submit-node-1 PriorityTier=3 GraceTime=900
With this configuration, if simultaneously two different users tries to execute in one of the RTX5070 (one in gpu-int.q and the other in gpu-ext.q), could both of them execute with no problems? If user from gpu-int.q
uses device #0 and user from gpu-ext.q uses device #1, could SLURM work without problems if a second simultaneously exection assigns device #1 to user from gpu-int.q and device #0 to user from gpu-ext.q?
Thanks.