<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Dear <font size="2"><span style="font-size:11pt">Ahmet M.</span></font>,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I've tried your recomendation but unfortunately it didn't work.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
But I realized that when I restart the slurmctld.service the job starts, but I don't know why.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Before, the job was stucked in CF, but when I restart the slurmctl it changes to R.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Have any ideas?<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks!<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>De:</b> mercan <ahmet.mercan@uhem.itu.edu.tr><br>
<b>Enviado:</b> quarta-feira, 30 de março de 2022 10:29<br>
<b>Para:</b> Slurm User Community List <slurm-users@lists.schedmd.com>; Nicolas Sonoda <nicolas.sonoda@versatushpc.com.br>; slurm-users@schedmd.com <slurm-users@schedmd.com><br>
<b>Assunto:</b> Re: [slurm-users] Problem with job allocation</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Hi;<br>
<br>
Slurm log says that your prolog did not finish at 300 seconds.<br>
<br>
<br>
Only possible cause that I see, is the line started with "sudo <br>
/usr/bin/beeond start -F -P -b /usr/bin/pdsh".<br>
<br>
<br>
You can put a timeout command at the begining of the sudo line to test:<br>
<br>
timeout 150 sudo /usr/bin/beeond start -F -P -b /usr/bin/pdsh ......<br>
<br>
<br>
If the problem is solved with the timeout command, you should check <br>
sudoers permission is correctly set for password-less sudo command. You <br>
can check permission by executing this sudo line as the slurm user.<br>
<br>
<br>
If sudoers permission is correct, but command takes too much time, you <br>
can increase this 300 seconds threshold.<br>
<br>
<br>
Regards,<br>
<br>
<br>
Ahmet M.<br>
<br>
<br>
<br>
<br>
<br>
On 30.03.2022 15:59, Nicolas Sonoda wrote:<br>
> Hi!<br>
><br>
> I'm getting the following error with prolog when I try to alocate more <br>
> then 2 nodes with Sbatch:<br>
><br>
> [2022-03-28T07:40:17.016] backfill: Started JobId=19825 in intel_large <br>
> on n[01-05]<br>
> [2022-03-28T07:45:17.310] _run_prolog: timeout after 300s: killing <br>
> pgid 45004<br>
> [2022-03-28T07:45:17.310] error: prolog_slurmctld JobId=19825 prolog <br>
> exit status 0:9<br>
><br>
> I have this configuration for my queue:<br>
><br>
> PartitionName=intel_large Nodes=n[01-10] Default=NO MaxTime=72:00:00 <br>
> MaxNodes=5 OverSubscribe=EXCLUSIVE State=UP<br>
><br>
> And I'm attaching my slurmctld.prolog<br>
><br>
> Can you help me with that?<br>
><br>
> Thanks!<br>
</div>
</span></font></div>
</body>
</html>