<div dir="ltr"><div>Comparing the Slurm <a href="https://slurm.schedmd.com/gres.html#MPS_config_example_2" target="_blank"> MPS configuration example here</a>, our gres.conf has this:</div><div><font face="monospace">NodeName=node[001-003] Name=mps Count=<span class="gmail-il">400</span></font><br></div><div><br></div><div>What does "Count" really mean and how do you use this number?</div><div><br></div><div>From that <a href="https://slurm.schedmd.com/gres.html#MPS_Management">web page</a> you have:</div><div>"<font face="monospace">MPS configuration includes only the Name and Count parameters: The count of gres/mps elements will be evenly distributed across all GPUs configured on the node. This is similar to case 1, but places duplicate configuration in the gres.conf file.</font>"<br></div><div><br></div><div>Also on that page there is this:</div><div><font face="monospace"># Example 1 of gres.conf<br># Configure support for four GPUs (with MPS)<br>AutoDetect=nvml<br>Name=gpu Type=gp100 File=/dev/nvidia0 Cores=0,1<br>Name=gpu Type=gp100 File=/dev/nvidia1 Cores=0,1<br>Name=gpu Type=p6000 File=/dev/nvidia2 Cores=2,3<br>Name=gpu Type=p6000 File=/dev/nvidia3 Cores=2,3<br># Set gres/mps Count value to 100 on each of the 4 available GPUs<br>Name=mps Count=400</font><br><br></div><div>And then this (sidenote, the typo of "<b>different</b>" in the example)</div><div><br></div><div><font face="monospace"># Example 2 of gres.conf<br># Configure support for four <b>differernt </b>GPU types (with MPS)<br>AutoDetect=nvml<br>Name=gpu Type=gtx1080 File=/dev/nvidia0 Cores=0,1<br>Name=gpu Type=gtx1070 File=/dev/nvidia1 Cores=0,1<br>Name=gpu Type=gtx1060 File=/dev/nvidia2 Cores=2,3<br>Name=gpu Type=gtx1050 File=/dev/nvidia3 Cores=2,3<br>Name=mps Count=1300   File=/dev/nvidia0<br>Name=mps Count=1200   File=/dev/nvidia1<br>Name=mps Count=1100   File=/dev/nvidia2<br>Name=mps Count=1000   File=/dev/nvidia3</font><br></div><div><br></div><div>And lower in the page, not sure what "to a job of step" means:</div><div><font face="monospace">The percentage will be calculated based upon the portion of the configured Count on the Gres is allocated to a job of step. For example, a job requesting "--gres=gpu:200" and using configuration example 2 above would be allocated<br>15% of the gtx1080 (File=/dev/nvidia0, 200 x 100 / 1300 = 15), or<br>16% of the gtx1070 (File=/dev/nvidia0, 200 x 100 / 1200 = 16), or<br>18% of the gtx1060 (File=/dev/nvidia0, 200 x 100 / 1100 = 18), or<br>20% of the gtx1050 (File=/dev/nvidia0, 200 x 100 / 1000 = 20).</font><br></div><div><br></div><div>How were the count values of 1300, 1200, 1100 and 1000 determined?</div><div><br></div><div>Now segueing to TensorFlow 2 and PyTorch memory greediness.</div><div><br></div><div>Using the same "<a href="https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/3_NeuralNetworks/dcgan.ipynb" target="_blank">Deep Convolutional Generative Adversarial Networks</a>" sample script and in my sbatch file I added:</div><div><font face="monospace">#SBATCH --gres=mps:35</font><br></div><div><font face="monospace">echo here is value of TF_FORCE_GPU_ALLOW_GROWTH $TF_FORCE_GPU_ALLOW_GROWTH<br>echo here is the CUDA-MPS-ActiveThread-Percentage $CUDA_MPS_ACTIVE_THREAD_PERCENTAGE</font><br></div><div><br></div><div>So the job log file showed this:</div><div><font face="monospace">here is value of TF_FORCE_GPU_ALLOW_GROWTH true<br>here is the CUDA-MPS-ActiveThread-Percentage 17</font><br></div><div><br></div><div>So that 17 is half of the 35 I see with the MPS option. The description from the SchedMD page reads:</div><div>"The percentage will be calculated based upon the portion of the configured Count on the Gres is allocated to a job of step."</div><div><br></div><div>So how does Count=<span class="gmail-il">400</span> from the gres.conf file factor in? Does it mean the job is using 17% of the available threads of the GPU? From nvidia-smi on this Slurm job:</div><div><font face="monospace"><span class="gmail-im" style="color:rgb(80,0,80)">+-----------------------------------------------------------------------------+<br>| Processes:                                                       GPU Memory |<br>|  GPU       PID   Type   Process name                             Usage      |<br>|=============================================================================| |<br></span>|    0     59793      C   python3.6                                   1135MiB |</font><br></div><div><br></div><div>The GPU has 32 GB:</div><div><font face="monospace"><span class="gmail-im" style="color:rgb(80,0,80)"><br>|   0  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                    0 |<br></span>| N/A   49C    P0   128W / 250W |   3417MiB / 32510MiB |     96%      Default |</font><br></div><div><br></div><p class="MsoNormal">So MPS and the Count option do not help with GPU memory. So I'm trying to find ways to tell our users how to avoid the OOM's. The most common advice is to <a href="https://stackoverflow.com/questions/37736071/tensorflow-out-of-memory">use smaller batches</a> but the complaint we get is it really slows down their jobs doing so.</p><p class="MsoNormal"><span style="color:rgb(31,56,100)"><br></span></p><p class="MsoNormal"><span style="color:rgb(31,56,100)">So I just found the section </span><a href="https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth">2 Physical GPUs, 2 Logical GPUs from the TensorFlow 2</a> docs, works by setting a hard limit, in this case 2048 MB, adding the below code after <span style="font-family:monospace">import tensorflow as tf</span></p><p class="MsoNormal"><font face="monospace"><br>gpus = tf.config.experimental.list_physical_devices('GPU')<br>if gpus:<br>#  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU<br>  try:<br>    tf.config.experimental.set_virtual_device_configuration(<br>        gpus[0],<br>        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])<br>    logical_gpus = tf.config.experimental.list_logical_devices('GPU')<br>    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")<br>  except RuntimeError as e:<br>#    # Virtual devices must be set before GPUs have been initialized<br>    print(e)</font></p><p class="MsoNormal"><br></p><p class="MsoNormal">I know this is outside of the scope of Slurm but I was hoping someone had a more graceful way rather than a hard memory limit to achieve this. The first option mentioned in the TF docs state: The first option is to turn on memory growth by calling "<font face="monospace">tf.config.experimental.set_memory_growth</font>, which attempts to allocate only as much GPU memory as needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, we extend the GPU memory region allocated to the TensorFlow process. Note we do not release memory, since it can lead to memory fragmentation." I've found using the <a href="https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/3_NeuralNetworks/recurrent_network.ipynb">Recurrent Neural Network Example</a>, it jumps up to 30 GB:</p><p class="MsoNormal"><font face="monospace"> I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30486 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:3b:00.0, compute capability: 7.0)<br></font><br></p><p class="MsoNormal">But at least we have a way to deal with our users as we have many TF and PyTorch CNN jobs.</p></div>