<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0">
I've encountered that many times, and for me, it was always related to AutoDetect and the nvidia-ml library. Does your slurmd log contain a line like "debug: skipping GRES for NodeName=t-gc-1202 AutoDetect=nvml"? I see that you didn't specifically set AutoDetect
to nvml in gres.conf, but maybe you should set AutoDetect=off just to be sure.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0">
If "sinfo" shows an "inval" node, then setting them to Resume (not Idle) won't work until you figure out why it thinks the node configuration is invalid.</div>
<div dir="ltr">
<div>
<div class="x_elementToProof">
<div class="x_ContentPasted0 x_ContentPasted5 x_ContentPasted6 x_ContentPasted7 x_ContentPasted8 x_ContentPasted9 x_ContentPasted10" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
</div>
</div>
</div>
</body>
</html>