<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>Do you have a line like this in your
cgroup_allowed_devices_file.conf <br>
/dev/nvidia*<br>
</p>
<p>?</p>
<p>Relu<br>
</p>
<div class="moz-cite-prefix">On 2020-10-08 16:32, Sajesh Singh
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:BL0PR14MB36522DEC4C3F8E4E77243F70AC0B0@BL0PR14MB3652.namprd14.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">It seems as though the modules are loaded
as when I run lsmod I get the following:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">nvidia_drm 43714 0<o:p></o:p></p>
<p class="MsoNormal">nvidia_modeset 1109636 1 nvidia_drm<o:p></o:p></p>
<p class="MsoNormal">nvidia_uvm 935322 0<o:p></o:p></p>
<p class="MsoNormal">nvidia 20390295 2
nvidia_modeset,nvidia_uvm<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Also the nvidia-smi command returns the
following:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">nvidia-smi<o:p></o:p></p>
<p class="MsoNormal">Thu Oct 8 16:31:57 2020<o:p></o:p></p>
<p class="MsoNormal">+-----------------------------------------------------------------------------+<o:p></o:p></p>
<p class="MsoNormal">| NVIDIA-SMI 440.64.00 Driver Version:
440.64.00 CUDA Version: 10.2 |<o:p></o:p></p>
<p class="MsoNormal">|-------------------------------+----------------------+----------------------+<o:p></o:p></p>
<p class="MsoNormal">| GPU Name Persistence-M|
Bus-Id Disp.A | Volatile Uncorr. ECC |<o:p></o:p></p>
<p class="MsoNormal">| Fan Temp Perf Pwr:Usage/Cap|
Memory-Usage | GPU-Util Compute M. |<o:p></o:p></p>
<p class="MsoNormal">|===============================+======================+======================|<o:p></o:p></p>
<p class="MsoNormal">| 0 Quadro M5000 Off |
00000000:02:00.0 Off | Off |<o:p></o:p></p>
<p class="MsoNormal">| 33% 21C P0 45W / 150W | 0MiB
/ 8126MiB | 0% Default |<o:p></o:p></p>
<p class="MsoNormal">+-------------------------------+----------------------+----------------------+<o:p></o:p></p>
<p class="MsoNormal">| 1 Quadro M5000 Off |
00000000:82:00.0 Off | Off |<o:p></o:p></p>
<p class="MsoNormal">| 30% 17C P0 45W / 150W | 0MiB
/ 8126MiB | 0% Default |<o:p></o:p></p>
<p class="MsoNormal">+-------------------------------+----------------------+----------------------+<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">+-----------------------------------------------------------------------------+<o:p></o:p></p>
<p class="MsoNormal">|
Processes:
GPU Memory |<o:p></o:p></p>
<p class="MsoNormal">| GPU PID Type Process
name Usage |<o:p></o:p></p>
<p class="MsoNormal">|=============================================================================|<o:p></o:p></p>
<p class="MsoNormal">| No running processes
found |<o:p></o:p></p>
<p class="MsoNormal">+-----------------------------------------------------------------------------+<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">--<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">-SS-<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> slurm-users
<a class="moz-txt-link-rfc2396E" href="mailto:slurm-users-bounces@lists.schedmd.com"><slurm-users-bounces@lists.schedmd.com></a>
<b>On Behalf Of </b>Relu Patrascu<br>
<b>Sent:</b> Thursday, October 8, 2020 4:26 PM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:slurm-users@lists.schedmd.com">slurm-users@lists.schedmd.com</a><br>
<b>Subject:</b> Re: [slurm-users] CUDA environment
variable not being set<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:solid #9C6500 1.0pt;padding:2.0pt 2.0pt 2.0pt
2.0pt">
<p class="MsoNormal"
style="line-height:12.0pt;background:#FFEB9C"><b><span
style="font-size:10.0pt;color:black">EXTERNAL SENDER</span></b><span
style="font-size:10.0pt;color:black"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p>That usually means you don't have the nvidia kernel module
loaded, probably because there's no driver installed.<o:p></o:p></p>
<p>Relu<o:p></o:p></p>
<div>
<p class="MsoNormal">On 2020-10-08 14:57, Sajesh Singh
wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">Slurm 18.08<o:p></o:p></p>
<p class="MsoNormal">CentOS 7.7.1908<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">I have 2 M500 GPUs in a compute node
which is defined in the slurm.conf and gres.conf of the
cluster, but if I launch a job requesting GPUs the
environment variable CUDA_VISIBLE_DEVICES Is never set and
I see the following messages in the slurmd.log file:<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">debug: common_gres_set_env: unable to
set env vars, no device files configured<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Has anyone encountered this before?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Thank you,<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">SS<o:p></o:p></p>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>