<div dir="ltr">Hi, I am using slurm 17.11.3-2 version on a small ROCKS 7 cluster. I
have two gpu nodes with nvidia driver 384.111 and opencl library
installed. Moreover in /etc/OpenCL/vendors directory there are two files
(nvidia.icd and intel.icd). Files are attached bellow. When I am
submiting slurm script I get this output file (see
output_intel_and_nvidia.icd.log). However when I remove nvidia.icd
file to another location and leave only intel.icd file in
/etc/OpenCL/vendors directory everything works fine. Also, what is
interesting that my code with OpenCL library works also fine when I am
running code directly from gpu node and every time I am not even using
gpu memory. In addition to, you can see my slurm script attached to this
message. Can someone explain how slurm interacts with .icd files and
why code can not be completed (it stops at ~10 GB of RAM allocated
instead of ~16-20 GB) when nvidia.icd persist?<div class="gmail-adL"><br></div></div>