<div style="color:black;font: 10pt arial;"><span style="font-size: 13.3333px;">Hello,</span>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">Slurm Power Saving (19.05.) was configured successfuly within our Cloud environment. Jobs can be submitted and nodes get provisioned and deprovisioned as expected. Unfortunately there seems to be an edge case (or config issue :-D).</div>
<div style="font-size: 13.3333px;">After a job (jobA) is submitted to partition A, node provisioning starts, during that phase another job (jobB) is submitted to the partition including requesting the same node (-w) - not sure if this is really a must have right now. The edge case is based on application job scheduling.</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">Unfortunately jobB runs before jobA and fails, but few seconds after jobA finishes successfully. Therefore the configuration should be ok - overall.</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">
<div><span style="font-size: 13.3333px;">srun: error: Unable to resolve "mynodename": Host name lookup failure</span></div>
<div><span style="font-size: 13.3333px;">srun: error: fwd_tree_thread: can't find address for host mynodename check slurm.conf</span></div>
<div><span style="font-size: 13.3333px;">srun: error: Task launch for 123456.0 failed on node mynodename: Can't find an address, check slurm.conf</span></div>
<div><span style="font-size: 13.3333px;">srun: error: Application launch failed: Can't find an address, check slurm.conf</span></div>
<div><span style="font-size: 13.3333px;">srun: Job step aborted: Waiting up to 188 seconds for job step to finish.</span></div>
<div><span style="font-size: 13.3333px;">srun: error: Timed out waiting for job step to complete</span></div>
</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">It looks like slurmctld applies some magic to jobA (Resetting JobId=jobidA start time for node power up) but not to jobB.</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">
<div><span style="font-size: 13.3333px;">update_node: node mynodename state set to ALLOCATED</span></div>
<div><span style="font-size: 13.3333px;">Node mynodename2 now responding</span></div>
<div><span style="font-size: 13.3333px;">Node mynodename now responding</span></div>
<div><span style="font-size: 13.3333px;">update_node: node mynodename state set to ALLOCATED</span></div>
<div><span style="font-size: 13.3333px;">_pick_step_nodes: Configuration for JobId=jobidB is complete</span></div>
<div><span style="font-size: 13.3333px;">job_step_signal: JobId=jobidB StepId=0 not found</span></div>
<div><span style="font-size: 13.3333px;">_pick_step_nodes: Configuration for JobId=jobidA is complete</span></div>
<div><span style="font-size: 13.3333px;">Resetting JobId=jobidA start time for node power up</span></div>
<div><span style="font-size: 13.3333px;">_job_complete: JobId=jobidA WEXITSTATUS 0</span></div>
<div><span style="font-size: 13.3333px;">_job_complete: JobId=jobidA done</span></div>
<div><span style="font-size: 13.3333px;">job_step_signal: JobId=jobidB StepId=0 not found</span></div>
<div><span style="font-size: 13.3333px;">_job_complete: JobId=jobidB WTERMSIG 116</span></div>
<div><span style="font-size: 13.3333px;">_job_complete: JobId=jobidB done</span></div>
</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">Has anyone seen this before or any idea how to fix it?</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">Thanks & Best</div>
<div style="font-size: 13.3333px;"><br>
</div>
<div style="font-size: 13.3333px;">Eg. Bo.</div>
</div>