<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:等线;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:"\@等线";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
pre
{mso-style-priority:99;
mso-style-link:"HTML 预设格式 字符";
margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
tt
{mso-style-priority:99;
font-family:"Courier New";}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
span.HTML
{mso-style-name:"HTML 预设格式 字符";
mso-style-priority:99;
mso-style-link:"HTML 预设格式";
font-family:Consolas;
color:black;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><tt><span style="font-size:10.0pt">$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=none -p GPU /usr/bin/env |grep CUDA</span></tt><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt><b>CUDA_VISIBLE_DEVICES=0,1<o:p></o:p></b></tt></span></p>
<p class="MsoNormal"><tt><b><span style="font-size:10.0pt"><o:p> </o:p></span></b></tt></p>
<p class="MsoNormal"><tt><b><span style="font-size:10.0pt">This result should be CUDA_VISIBLE_DEVICES=</span></b></tt>NoDevFiles, and it really is NoDevFiles in 17.02. So this must be a bug in 17.11.7.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"> slurm-users <slurm-users-bounces@lists.schedmd.com>
<b>On Behalf Of </b>Brian W. Johanson<br>
<b>Sent:</b> Thursday, August 30, 2018 11:23 PM<br>
<b>To:</b> slurm-users@lists.schedmd.com<br>
<b>Subject:</b> Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><tt><span style="font-size:10.0pt">and to answer "CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7"</span></tt><o:p></o:p></p>
<p><tt><span style="font-size:10.0pt">CUDA_VISIBLE_DEVICES is unset if --gres=none and if set in the user's environment, it will remains set to whatever. If you want really want to see NoDevFIles, set it in /etc/profile.d, it will get clobbered when the resources
are actually there.</span></tt><o:p></o:p></p>
<p><o:p> </o:p></p>
<p><tt><span style="font-size:10.0pt">$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=none -p GPU /usr/bin/env |grep CUDA</span></tt><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt><b>CUDA_VISIBLE_DEVICES=0,1</b></tt><br>
<tt>$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=none -p GPU nvidia-smi</tt><br>
<tt><b>No devices were found</b></tt></span><o:p></o:p></p>
<p><o:p> </o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><tt><span style="font-size:10.0pt">$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=gpu:1 -p GPU /usr/bin/env |grep CUDA</span></tt><b><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>CUDA_VISIBLE_DEVICES=0</tt></span></b><br>
<tt><span style="font-size:10.0pt">$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=gpu:1 -p GPU nvidia-smi |grep Tesla | wc</span></tt><br>
<tt><span style="font-size:10.0pt"> <b> 1 11 80</b></span></tt><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>$ </tt></span><br>
<br>
<o:p></o:p></p>
<p><tt><span style="font-size:10.0pt">$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=gpu:2 -p GPU /usr/bin/env |grep CUDA</span></tt><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt><b>CUDA_VISIBLE_DEVICES=0,1</b></tt><br>
<tt>$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=gpu:2 -p GPU nvidia-smi |grep Tesla | wc</tt><br>
<tt><b> 2 22 160</b></tt><br>
<tt>$ </tt></span><o:p></o:p></p>
<p><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 08/30/2018 10:48 AM, Renfro, Michael wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>Chris’ method will set CUDA_VISIBLE_DEVICES like you’re used to, and it will help keep you or your users from picking conflicting devices.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>My cgroup/GPU settings from slurm.conf:<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>=====<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>[renfro@login ~]$ egrep -i '(cgroup|gpu)' /etc/slurm/slurm.conf | grep -v '^#'<o:p></o:p></pre>
<pre>ProctrackType=proctrack/cgroup<o:p></o:p></pre>
<pre>TaskPlugin=task/affinity,task/cgroup<o:p></o:p></pre>
<pre>NodeName=gpunode[001-004] CoresPerSocket=14 RealMemory=126000 Sockets=2 ThreadsPerCore=1 Gres=gpu:2<o:p></o:p></pre>
<pre>PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpunode[001-004]<o:p></o:p></pre>
<pre>PartitionName=gpu-debug Default=NO MinNodes=1 MaxTime=00:30:00 AllowGroups=ALL PriorityJobFactor=2 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpunode[001-004]<o:p></o:p></pre>
<pre>PartitionName=gpu-interactive Default=NO MinNodes=1 MaxNodes=2 MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpunode[001-004]<o:p></o:p></pre>
<pre>GresTypes=gpu,mic<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>=====<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Example (where srun is a function that runs “srun --pty $SHELL -I”), with no CUDA_VISIBLE_DEVICES on the submit host, but is correctly set on reserving GPUs:<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>=====<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>[renfro@login ~]$ echo $CUDA_VISIBLE_DEVICES<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>[renfro@login ~]$ hpcshell --partition=gpu-interactive --gres=gpu:1<o:p></o:p></pre>
<pre>[renfro@gpunode003 ~]$ echo $CUDA_VISIBLE_DEVICES<o:p></o:p></pre>
<pre>0<o:p></o:p></pre>
<pre>[renfro@login ~]$ hpcshell --partition=gpu-interactive --gres=gpu:2<o:p></o:p></pre>
<pre>[renfro@gpunode004 ~]$ echo $CUDA_VISIBLE_DEVICES<o:p></o:p></pre>
<pre>0,1<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>=====<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>On Aug 30, 2018, at 4:18 AM, Chaofeng Zhang <a href="mailto:zhangcf1@lenovo.com"><zhangcf1@lenovo.com></a> wrote:<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>CUDA_VISBLE_DEVICES is used by many AI framework to determine which gpu to use, like tensorflow. So this environment is critical to us.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>-----Original Message-----<o:p></o:p></pre>
<pre>From: slurm-users <a href="mailto:slurm-users-bounces@lists.schedmd.com"><slurm-users-bounces@lists.schedmd.com></a> On Behalf Of Chris Samuel<o:p></o:p></pre>
<pre>Sent: Thursday, August 30, 2018 4:42 PM<o:p></o:p></pre>
<pre>To: <a href="mailto:slurm-users@lists.schedmd.com">slurm-users@lists.schedmd.com</a><o:p></o:p></pre>
<pre>Subject: [External] Re: [slurm-users] serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>On Thursday, 30 August 2018 6:38:08 PM AEST Chaofeng Zhang wrote:<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7. <o:p></o:p></pre>
<pre>This is worked when we use Slurm 17.02.<o:p></o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
<pre>You probably should be using cgroups instead to constrain access to GPUs. <o:p></o:p></pre>
<pre>Then it doesn't matter what you set CUDA_VISBLE_DEVICES to be as processes will only be able to access what they requested.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Hope that helps!<o:p></o:p></pre>
<pre>Chris<o:p></o:p></pre>
<pre>--<o:p></o:p></pre>
<pre>Chris Samuel : <a href="http://www.csamuel.org/">http://www.csamuel.org/</a> : Melbourne, VIC<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
</blockquote>
<pre><o:p> </o:p></pre>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>