<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Hello Slurm Admins,</div>
<div class="elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
I have set up Slurm for a GPU-cluster. The basic installation without</div>
<div class="elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
gres/gpu works well. Now I try adding the GPUs to the Slurm configuration.</div>
<div class="elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
All attempts have failed so far and I always get with sinfo -R the message</div>
<div class="elementToProof ContentPasted1 ContentPasted2 ContentPasted3 ContentPasted4" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="elementToProof">
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
gres/gpu count reported lower than configured ( 0 < 2 )</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
With nvidia-smi the GPUs are found and running jobs on them works well.</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
I have tried to get rid off the above error by updating the state to IDLE with</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
scontrol. That attempt also failed with error message</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
slurm_update error: Invalid node state specified</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
I ran slurmd on the GPU node with debug5 level. From slurmd.log I see that</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
gres.conf is found and gres_gpu.so / gpu_genric.so are loaded. <br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
My Slurm configuration is as follows:</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
slurm.conf:</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
GresTypes=gpu</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
NodeName=hpc-node14 CPUs=128 RealMemory=515815 Sockets=2 CoresPerSocket=64 ThreadsPerCore=1 Gres=gpu:2 State=UNKNOWN</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
gres.conf:<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
NodeName=hpc-node[01-14] Name=gpu File=/dev/nvidia[0-1]</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Does anyone know what is wrong and how to fix that problem?</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Thank you.</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Best wishes</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Achim<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10 ContentPasted11" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div class="ContentPasted0 ContentPasted5 ContentPasted6 ContentPasted7 ContentPasted8 ContentPasted9 ContentPasted10" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
</div>
</body>
</html>