<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>Sounds like a race condition where slurmd is starting before the
node is truly ready.</p>
<p>You can try adding dependencies for slurmd so it will not start
until some other needed service is running.</p>
<p><br>
</p>
<p>The benefits of systemd :)</p>
<p><br>
</p>
<p>Brian Andrus<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 6/9/2020 10:53 AM, Dumont, Joey
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:1591725189118.44307@nrc-cnrc.gc.ca">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; }--></style>
<p>Hi, <br>
</p>
<p><br>
</p>
<p>I am encountering a weird issue, and I'm not sure where it is
coming from.<br>
</p>
<p><br>
</p>
<p>I have setup a slurm-based cluster using AWS ParallelCluster. I
have tweaked the slurm configuration to enable X forwarding by
setting PrologFlags=X11. The ParallelCluster portion is
relevant, as basically every time a user queues a job, a brand
new compute node is provisioned, and added to the default queue.
Users want to run a GUI based application based on Qt5. To run
it, they issue something like:<br>
</p>
<p><br>
</p>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;">
<p> <span style="font-family: "Courier New",
monospace;">salloc --nodes=1 --ntasks=1 --cpus-per-task=48
--x11=all srun run_lsf.sh</span></p>
</blockquote>
<div id="Signature">
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<br>
</div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font>However, if there are no nodes available,
a new one is provisioned and the job is run on the new
node. Every time this job is the first job on the compute
node, the application crashes. If I issue the exact same
command a second time (it usually gets allocated to the
same node), then it runs without any issues. I was able to
retrieve this from the core dump:</font></font></div>
</div>
<blockquote style="margin: 0 0 0 40px; border: none; padding:
0px;">
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<pre style="font-family: Consolas, Menlo, "Liberation Mono", Courier, monospace; margin: 1em 1em 1em 1.6em; padding: 8px; background-color: rgb(250, 250, 250); border: 1px solid rgb(226, 226, 226); border-radius: 3px; width: auto; overflow: auto hidden; color: rgb(51, 51, 51); font-size: 12px;">(gdb) bt
#0 0x00007fffdfced337 in raise () from /lib64/libc.so.6
#1 0x00007fffdfceea28 in abort () from /lib64/libc.so.6
#2 0x00007fffe2e699db in QMessageLogger::fatal(char const*, ...) const ()
from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Core.so.5
#3 0x00007fffe44ce28b in QGuiApplicationPrivate::createPlatformIntegration() ()
from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Gui.so.5
#4 0x00007fffe44ce72d in QGuiApplicationPrivate::createEventDispatcher() ()
from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Gui.so.5
#5 0x00007fffe30579f5 in QCoreApplicationPrivate::init() ()
from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Core.so.5
#6 0x00007fffe44cfcec in QGuiApplicationPrivate::init() ()
from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Gui.so.5
#7 0x00007fffe4cfcca9 in QApplicationPrivate::init() ()
from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Widgets.so.5
#8 0x0000000001f17345 in ?? ()
#9 0x00000000005286bb in ?? ()
#10 0x00007fffdfcd9505 in __libc_start_main () from /lib64/libc.so.6
#11 0x0000000000522201 in ?? ()
#12 0x00007fffffff3928 in ?? ()
#13 0x000000000000001c in ?? ()
#14 0x0000000000000004 in ?? ()
#15 0x00007fffffff3c5e in ?? ()
#16 0x00007fffffff3cfd in ?? ()
#17 0x00007fffffff3d01 in ?? ()
#18 0x00007fffffff3d06 in ?? ()
#19 0x0000000000000000 in ?? ()
</pre>
</div>
</blockquote>
<div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font><br>
</font></font></div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font>So it seems that the Qt5 application
cannot initialize, possibly due to the X server not being
ready? I tried adding a delay before starting starting the
GUI application, but that didn't seem to help.</font></font></div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font><br>
</font></font></div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font>Do you have any idea of where to look for
relevant errors? /var/log/messages indicates that the app
crashed, without any additional information.</font></font></div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font><br>
</font></font></div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
The nodes are running on CentOS 7. <br>
<br>
</div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
Let me know if additional info is needed.<br>
</div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<br>
</div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
Cheers,<br>
</div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font><br>
</font></font></div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<font style=""><font>Joey Dumont</font></font></div>
<div name="divtagdefaultwrapper"
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:; margin:0">
<div style=""><br>
</div>
<div style="">Technical Advisor, Knowledge, Information, and
Technology Services</div>
<div style="">National Research Council Canada / Governement
of Canada</div>
<div style=""><a tabindex="0"
href="mailto:joey.dumont@nrc-cnrc.gc.ca" id="NoLP"
moz-do-not-send="true">joey.dumont@nrc-cnrc.gc.ca</a> /
Tel: 613-990-8152 / Cell: 438-340-7436</div>
<div style=""><br>
</div>
<div style="">Conseiller technique, Services du savoir, de
l'information et de la technologie</div>
<div style="">Conseil national de recherches Canada /
Gouvernement du Canada</div>
<div style=""><a tabindex="0"
href="mailto:joey.dumont@nrc-cnrc.gc.ca" id="NoLP"
moz-do-not-send="true">joey.dumont@nrc-cnrc.gc.ca</a> /
Tél.: 613-990-8152 / Tél. cell.: 438-340-7436</div>
</div>
</div>
</blockquote>
</body>
</html>