<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:455219181;
mso-list-type:hybrid;
mso-list-template-ids:989911356 134807567 134807577 134807579 134807567 134807577 134807579 134807567 134807577 134807579;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1
{mso-list-id:483590435;
mso-list-template-ids:2050510642;}
@list l2
{mso-list-id:641159585;
mso-list-type:hybrid;
mso-list-template-ids:-1357485246 134807567 134807577 134807579 134807567 134807577 134807579 134807567 134807577 134807579;}
@list l2:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l2:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l2:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l2:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l2:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l2:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l2:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l2:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l2:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head><body lang="EN-GB" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">For the benefit of anyone else who comes across this, I’ve managed to resolve the issue.<o:p></o:p></p>
<ol style="margin-top:0cm" start="1" type="1">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo4">Remove the affected node entries from the slurm.conf on slurmctld host<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo4">Restart slurmctld<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo4">Re-add the nodes back to slurm.conf on slurmctld host<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo4">Restart slurmctld again<o:p></o:p></li></ol>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Following this, the Gres= lines in `scontrol show node …` display the new type. I guess this means slurmctld was persisting some state about the previous gres type somewhere, but I’m not sure where, and removing the node from slurm.conf
and restarting caused this to be flushed.<o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">--<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Ben Roberts<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="mso-fareast-language:EN-GB">From:</span></b><span lang="EN-US" style="mso-fareast-language:EN-GB"> Ben Roberts
<br>
<b>Sent:</b> 19 June 2023 11:57<br>
<b>To:</b> slurm-users@lists.schedmd.com<br>
<b>Subject:</b> GPU Gres Type inconsistencies<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi all,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m trying to set up GPU Gres Types to correctly identify the installed hardware (generation and memory size). I’m using a mix of explicit configuration (to set a friendly type name) and autodetection (to handle the cores and links detection).
I’m seeing two related issues which I don’t understand.<o:p></o:p></p>
<ol style="margin-top:0cm" start="1" type="1">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l2 level1 lfo3">The output of `scontrol show node` references `Gres=gpu:tesla:2` instead of the type I’m specifying in the config file (`v100s-pcie-32gb`)<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l2 level1 lfo3">Attempts to schedule jobs using generic `--gpus 1` are working fine, but attempts to specify the gpu type (either with `--gres gpu:v100s-pcie-32gb:1` or `--gres gpu:v100s-pcie-32gb:1`
fail with `error: Unable to allocate resources: Requested node configuration is not available`<o:p></o:p></li></ol>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If I’ve understood the documentation (<a href="https://slurm.schedmd.com/gres.conf.html#OPT_Type">https://slurm.schedmd.com/gres.conf.html#OPT_Type</a>), I should be able to use any substring of what nvml detects the card as (`<span style="mso-fareast-language:EN-GB">tesla_v100s-pcie-32gb`</span>)
as the Type string. With gres debug flag set, I can see the GPUs are detected, and matched up with the static entries in gres.conf correctly. I don’t see any mention of Type=tesla in the logs, so I’m at a loss as to why scontrol show node is reporting `gpu:tesla`
instead of `gpu:v100s-pcie-32gb` as configured. I presume this mismatch is the cause of the failure to schedule, because while the job spec matches the configured gpu type and should be schedulable, the scheduler doesn’t actually see any resources of this
type available to run.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The “tesla” string is the first “word” of the autodetected type, but I can’t see why it would be being truncated to just this rather than using the whole string. I did previously use the type “tesla” in the config, which worked fine since
everything matched up, but since does not adequately describe the hardware so I need to change this to be more specific. Is there anywhere other than slurm.conf or gres.conf where the old gpu type might be persisted and need purging?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve tried using `scontrol update node=gpu2 gres=gpu:v100s-pcie-32gb:0` to manually change the gres type (trying to set the number of GPUs to 2 here is rejected, but 0 is accepted). `scontrol reconfig` then causes the `scontrol show node`
output to update to `Gres=vpu:v100s-pcie-32gb:2` as expected, but removes the gpus from CfgTRES. After restarting slurmctld, the Gres, and cfgTRES briefly match up for all nodes, but very shortly after the Gres entries revert back to Gres=gpu:tesla:0 again,
so back to square 1. <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve tried using the full tesla_v100s-pcie-32gb string as the type also, but this has no effect, the gres type is still reported as gpu:tesla only. This is all with slurm 23.02.3, on Rocky Linux 8.8, using cuda-nvml-devel-12-0-12.0.140-1.x86_64.
Excerpts from configs and logs shown below.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Can anyone point me in the right direction in how to solve this? Thanks,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"># /etc/slurm/gres.conf<o:p></o:p></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">AutoDetect=nvml<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"># /etc/slurm/slurm.conf (identical on all nodes)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">AccountingStorageTRES=gres/gpu,gres/gpu:v100s-pcie-32gb,gres/gpu:v100-pcie-32gb<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">EnforcePartLimits=ANY<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">GresTypes=gpu<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">NodeName=gpu2 CoresPerSocket=8 CPUs=8 Gres=gpu:v100s-pcie-32gb:2 Sockets=1 ThreadsPerCore=1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"># scontrol show node gpu2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">NodeName=gpu2 Arch=x86_64 CoresPerSocket=8
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> CPUAlloc=0 CPUEfctv=8 CPUTot=8 CPULoad=0.02<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> AvailableFeatures=…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> Gres=gpu:tesla:0(S:0)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> NodeAddr=gpu2.example.com NodeHostName=gpu2 Version=23.02.3<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> OS=Linux 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Tue May 30 22:15:39 UTC 2023
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> RealMemory=331301 AllocMem=0 FreeMem=334102 Sockets=1 Boards=1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> MemSpecLimit=500<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> Partitions=gpu <o:p>
</o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> BootTime=2023-06-14T23:03:05 SlurmdStartTime=2023-06-18T23:25:21<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> LastBusyTime=2023-06-18T23:23:23 ResumeAfterTime=None<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> CfgTRES=cpu=8,mem=331301M,billing=8,gres/gpu=2,gres/gpu:v100s-pcie-32gb=2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"> AllocTRES=<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"># /var/log/slurm/slurmd.log (trimmed to only relevant lines for brevity)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:25.629] GRES: Global AutoDetect=nvml(1)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:25.629] debug: gres/gpu: init: loaded<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:25.629] debug: gpu/nvml: init: init: GPU NVML plugin loaded<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.265] debug2: gpu/nvml: _nvml_init: Successfully initialized NVML<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.265] debug: gpu/nvml: _get_system_gpu_list_nvml: Systems Graphics Driver Version: 525.105.17<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.265] debug: gpu/nvml: _get_system_gpu_list_nvml: NVML Library Version: 12.525.105.17<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: NVML API Version: 11<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Total CPU count: 8<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Device count: 2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: GPU index 0:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: Name: tesla_v100s-pcie-32gb<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: UUID: GPU-1ef493da-bf08-60a4-8afb-4db79646f86e<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: PCI Domain/Bus/Device: 0:11:0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: PCI Bus ID: 00000000:0B:00.0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: NVLinks: -1,0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: Device File (minor number): /dev/nvidia0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: CPU Affinity Range - Machine: 0-7<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: Core Affinity Range - Abstract: 0-7<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: MIG mode: disabled<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: Possible GPU Memory Frequencies (1):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: -------------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: *1107 MHz [0]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: Possible GPU Graphics Frequencies (196):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: ---------------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: *1597 MHz [0]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: *1590 MHz [1]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: ...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: *870 MHz [97]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: ...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: *142 MHz [194]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: *135 MHz [195]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: GPU index 1:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: Name: tesla_v100s-pcie-32gb<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: UUID: GPU-0e7d20b1-5a0f-8ef6-5120-970bd26210bb<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: PCI Domain/Bus/Device: 0:19:0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: PCI Bus ID: 00000000:13:00.0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: NVLinks: 0,-1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: Device File (minor number): /dev/nvidia1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: CPU Affinity Range - Machine: 0-7<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: Core Affinity Range - Abstract: 0-7<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: MIG mode: disabled<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: Possible GPU Memory Frequencies (1):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: -------------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: *1107 MHz [0]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: Possible GPU Graphics Frequencies (196):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: ---------------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: *1597 MHz [0]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: *1590 MHz [1]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: ...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: *870 MHz [97]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: ...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: *142 MHz [194]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: *135 MHz [195]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] gpu/nvml: _get_system_gpu_list_nvml: 2 GPU system device(s) detected<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] Gres GPU plugin: Merging configured GRES with system GPUs<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: gres/gpu: _merge_system_gres_conf: gres_list_conf:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):(null) Links:(null) Flags:HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT File:/dev/nvidia0
UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):(null) Links:(null) Flags:HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT File:/dev/nvidia1
UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug: gres/gpu: _merge_system_gres_conf: Including the following GPU matched between system and configuration:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug: GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7 Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug: gres/gpu: _merge_system_gres_conf: Including the following GPU matched between system and configuration:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug: GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7 Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: gres/gpu: _merge_system_gres_conf: gres_list_gpu<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7 Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] debug2: GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7 Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] Gres GPU plugin: Final merged GRES list:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7 Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] GRES[gpu] Type:v100s-pcie-32gb Count:1 Cores(8):0-7 Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 UniqueId:(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] GRES: _set_gres_device_desc : /dev/nvidia0 major 195, minor 0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] GRES: _set_gres_device_desc : /dev/nvidia1 major 195, minor 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] GRES: gpu device number 0(/dev/nvidia0):c 195:0 rwm<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] GRES: gpu device number 1(/dev/nvidia1):c 195:1 rwm<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] Gres Name=gpu Type=v100s-pcie-32gb Count=1 Index=0 ID=7696487 File=/dev/nvidia0 Cores=0-7 CoreCnt=8 Links=-1,0 Flags=HAS_FILE,HAS_TYPE,ENV_NVML<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] Gres Name=gpu Type=v100s-pcie-32gb Count=1 Index=1 ID=7696487 File=/dev/nvidia1 Cores=0-7 CoreCnt=8 Links=0,-1 Flags=HAS_FILE,HAS_TYPE,ENV_NVML<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.303] CPU frequency setting not configured for this node<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.304] slurmd version 23.02.3 started<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.306] slurmd started on Mon, 19 Jun 2023 11:29:26 +0100<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.307] CPUs=8 Boards=1 Sockets=1 Cores=8 Threads=1 Memory=338063 TmpDisk=2048 Uptime=390381 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">[2023-06-19T11:29:26.310] debug: _handle_node_reg_resp: slurmctld sent back 14 TRES.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">--<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-GB">Ben Roberts<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<hr>
<font face="Arial" color="Gray" size="1">
For details of how GSA uses your personal information, please see our Privacy Notice here: <a href="https://www.gsacapital.com/privacy-notice" target="_blank">https://www.gsacapital.com/privacy-notice</a>
<br>
<br>
This email and any files transmitted with it contain confidential and proprietary information and is solely for the use of the intended recipient. If you are not the intended recipient please return the email to the sender and delete it from your computer and you must not use, disclose, distribute, copy, print or rely on this email or its contents. This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. Any comments or statements made herein do not necessarily reflect those of GSA Capital. GSA Capital Partners LLP is authorised and regulated by the Financial Conduct Authority and is registered in England and Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261. GSA Capital Services Limited is registered in England and Wales at the same address, number 5320529.<br>
<br></font>
</body></html>