<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Dear all,</p>

    We have upgraded our cluster from 13 to slurm17.11.. We have some

    problem with gpu  configurations.. Although I request no GPUs,

    system let me use gpu cards.. <br>

    <br>

    Let me explain..<br>

    <b>Slurm.conf:<br>

    </b>SelectType=select/cons_res<br>

    SelectTypeParameters=CR_CPU_Memory <br>

    TaskPlugin=task/cgroup<br>

    TaskPlugin=task/cgroup<br>

    PreemptType=preempt/none<br>

    <br>

    NodeName=cudanode[1-20]  Procs=40   Sockets=2  CoresPerSocket=20

    ThreadsPerCore=2 RealMemory=384000   Gres=gpu:2<br>

    PartitionName=cuda       Nodes=cudanode[1-20]   Default=no   

    MaxTime=15-00:00:00 defaulttime=00:02:00 State=UP DefMemPerCPU=8500

    MaxMemPerNode=380000  Shared=NO Priority=1000<br>

    <br>

    <b>Gres.conf:</b><br>

    Name=gpu File=/dev/nvidia0

CPUs=0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78<br>

    Name=gpu File=/dev/nvidia1

CPUs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79<br>

    <br>

    <br>

    I am testing the configuration  with deviceQuery app comes with

    cuda9 pack.<br>

    <br>

    When I send a job with 2 gpus, system reserved  right number of

    GPUS..<br>

    <b>srun   -n 1   -p </b><b>cuda   --nodelist=</b><b>cudanode1 

      --gres=gpu:2 ./cuda.sh</b><br>

    CUDA_VISIBLE_DEVICES:  0,1<br>

    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA

    Runtime Version = 7.5, NumDevs = 2, Device0 = Tesla P100-PCIE-16GB,

    Device1 = Tesla P100-PCIE-16GB Result = PASS

    <div class="moz-signature"><br>

      When I send a job with 1 gpus, system reserved  right number of

      GPUS..<br>

      <br>

      <b>srun   -n 1   -p </b><b>cuda   --nodelist=</b><b>cudanode1 

        --gres=gpu:1 ./cuda.sh </b><br>

      CUDA_VISIBLE_DEVICES:  0<br>

      deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA

      Runtime Version = 7.5, NumDevs = 1, Device0 = Tesla P100-PCIE-16GB<br>

      Result = PASS<br>

      <br>

      <font color="#ff0000"><b>But when I send a job without any GPUS,

          system also let me use  GPUS, that I dont expect.<br>

          srun   -n 1   -p cuda   --nodelist=cudanode1  ./cuda.sh <br>

          CUDA_VISIBLE_DEVICES:  <br>

          deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1,

          CUDA Runtime Version = 7.5, NumDevs = 2, Device0 = Tesla

          P100-PCIE-16GB, Device1 = Tesla P100-PCIE-16GB<br>

          Result = PASS</b><font size="+1"><br>

        </font></font><br>

      By this way. I am able to run 40 jobs which all use the gpus  on

      one server at the same time. Is it a bug or I missed something ?

      While we use  previous versions of slurm, gpu allocation was how I

      expected. I also tried with cuda-enabled namd which uses higher

      level hardware access methods and I get the same result.<br>

      <br>

      Another problem I hit, when I change the gpu configuration from

      Gres=gpu:2 to Gres=gpu:no_consume:2 to be able to  use

      simultaneously by many jobs, system let me use all cards

      independent of how many cards I request.. <br>

      <br>

      <br>

      Regards,<br>

      Sefa ARSLAN<br>

    </div>

  </body>

</html>