[slurm-users] Reduced memory allocation limits using slurm

Edvinas Sulžickis edvinas31 at gmail.com
Tue Feb 12 09:25:55 UTC 2019


Hi, I am using slurm 17.11.3-2 version on a small ROCKS 7 cluster. I have
two gpu nodes with nvidia driver 384.111 and opencl library installed.
Moreover in /etc/OpenCL/vendors directory there are two files (nvidia.icd
and intel.icd). Files are attached bellow. When I am submiting slurm script
I get this output file (see output_intel_and_nvidia.icd.log). However when
I remove nvidia.icd file to another location and leave only intel.icd file
in /etc/OpenCL/vendors directory everything works fine. Also, what is
interesting that my code with OpenCL library works also fine when I am
running code directly from gpu node and every time I am not even using gpu
memory. In addition to, you can see my slurm script attached to this
message. Can someone explain how slurm interacts with .icd files and why
code can not be completed (it stops at ~10 GB of RAM allocated instead of
~16-20 GB) when nvidia.icd persist?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190212/fd72019f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nvidia.icd
Type: application/octet-stream
Size: 22 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190212/fd72019f/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output_only_intel.icd.log
Type: text/x-log
Size: 9827 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190212/fd72019f/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: output_intel_and_nvidia.icd.log
Type: text/x-log
Size: 15533 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190212/fd72019f/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: intel.icd
Type: application/octet-stream
Size: 60 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190212/fd72019f/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CPU-K5000_EDD.sh
Type: application/x-shellscript
Size: 400 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190212/fd72019f/attachment-0005.bin>


More information about the slurm-users mailing list