<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Double-check the account info on that node (c0801).</p>
<p>Could be the node does not recognize the uid being assigned to
the user/job.</p>
<p>Brian Andrus<br>
</p>
<div class="moz-cite-prefix">On 5/13/2022 2:31 PM, Williams, Jenny
Avis wrote:<br>
</div>
<blockquote type="cite"
cite="mid:BL1PR03MB60722968B9D7901441D65B589BCA9@BL1PR03MB6072.namprd03.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Yesterday I upgraded slurmdbd and slurmctld
nodes from RHEL7 / Slurm v. 20.11.8 to RHEL8.5 / Slurm v.
21.08.6 on our production cluster.<o:p></o:p></p>
<p class="MsoNormal">I also updated slurm on the rhel7 login
nodes to 21.08.6<o:p></o:p></p>
<p class="MsoNormal">Sbatch jobs run fine.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Srun, however, fails from the updated login
node with invalid job credential errors. Sruns from nodes that
are not update runs fine.<o:p></o:p></p>
<p class="MsoNormal">I am hoping this looks familiar to you.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas">$ srun
--slurmd-debug=verbose -n 1 -t 8:00:00 --mem=3g -p interact
-w c0801 --pty /bin/bash<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas">srun: job
45281066 queued and waiting for resources<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas">srun: job
45281066 has been allocated resources<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas">srun: error:
Task launch for StepId=45281066.0 failed on node c0801:
Invalid job credential<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas">srun: error:
Application launch failed: Invalid job credential<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas">srun: Job step
aborted: Waiting up to 32 seconds for job step to finish.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas">srun: error:
Timed out waiting for job step to complete<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:Consolas"><o:p> </o:p></span></p>
</div>
</blockquote>
</body>
</html>