<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body>
    <p>Sounds like a race condition where slurmd is starting before the
      node is truly ready.</p>
    <p>You can try adding dependencies for slurmd so it will not start
      until some other needed service is running.</p>
    <p><br>
    </p>
    <p>The benefits of systemd :)</p>
    <p><br>
    </p>
    <p>Brian Andrus<br>
    </p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 6/9/2020 10:53 AM, Dumont, Joey
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:1591725189118.44307@nrc-cnrc.gc.ca">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; }--></style>
      <p>Hi, <br>
      </p>
      <p><br>
      </p>
      <p>I am encountering a weird issue, and I'm not sure where it is
        coming from.<br>
      </p>
      <p><br>
      </p>
      <p>I have setup a slurm-based cluster using AWS ParallelCluster. I
        have tweaked the slurm configuration to enable X forwarding by
        setting PrologFlags=X11. The ParallelCluster portion is
        relevant, as basically every time a user queues a job, a brand
        new compute node is provisioned, and added to the default queue.
        Users want to run a GUI based application based on Qt5. To run
        it, they issue something like:<br>
      </p>
      <p><br>
      </p>
      <blockquote style="margin: 0 0 0 40px; border: none; padding:
        0px;">
        <p> <span style="font-family: "Courier New",
            monospace;">salloc --nodes=1 --ntasks=1 --cpus-per-task=48
            --x11=all srun run_lsf.sh</span></p>
      </blockquote>
      <div id="Signature">
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <br>
        </div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font>However, if there are no nodes available,
              a new one is provisioned and the job is run on the new
              node. Every time this job is the first job on the compute
              node, the application crashes. If I issue the exact same
              command a second time (it usually gets allocated to the
              same node), then it runs without any issues. I was able to
              retrieve this from the core dump:</font></font></div>
      </div>
      <blockquote style="margin: 0 0 0 40px; border: none; padding:
        0px;">
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <pre style="font-family: Consolas, Menlo, "Liberation Mono", Courier, monospace; margin: 1em 1em 1em 1.6em; padding: 8px; background-color: rgb(250, 250, 250); border: 1px solid rgb(226, 226, 226); border-radius: 3px; width: auto; overflow: auto hidden; color: rgb(51, 51, 51); font-size: 12px;">(gdb) bt
#0  0x00007fffdfced337 in raise () from /lib64/libc.so.6
#1  0x00007fffdfceea28 in abort () from /lib64/libc.so.6
#2  0x00007fffe2e699db in QMessageLogger::fatal(char const*, ...) const ()
   from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Core.so.5
#3  0x00007fffe44ce28b in QGuiApplicationPrivate::createPlatformIntegration() ()
   from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Gui.so.5
#4  0x00007fffe44ce72d in QGuiApplicationPrivate::createEventDispatcher() ()
   from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Gui.so.5
#5  0x00007fffe30579f5 in QCoreApplicationPrivate::init() ()
   from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Core.so.5
#6  0x00007fffe44cfcec in QGuiApplicationPrivate::init() ()
   from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Gui.so.5
#7  0x00007fffe4cfcca9 in QApplicationPrivate::init() ()
   from /shared/opt/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/lumerical-2020a-r5-mt7ihfs2o3wfpxrn2ciw2oqfoqvo34dl/opt/lumerical/2020a/bin/../lib/libQt5Widgets.so.5
#8  0x0000000001f17345 in ?? ()
#9  0x00000000005286bb in ?? ()
#10 0x00007fffdfcd9505 in __libc_start_main () from /lib64/libc.so.6
#11 0x0000000000522201 in ?? ()
#12 0x00007fffffff3928 in ?? ()
#13 0x000000000000001c in ?? ()
#14 0x0000000000000004 in ?? ()
#15 0x00007fffffff3c5e in ?? ()
#16 0x00007fffffff3cfd in ?? ()
#17 0x00007fffffff3d01 in ?? ()
#18 0x00007fffffff3d06 in ?? ()
#19 0x0000000000000000 in ?? ()
</pre>
        </div>
      </blockquote>
      <div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font><br>
            </font></font></div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font>So it seems that the Qt5 application
              cannot initialize, possibly due to the X server not being
              ready? I tried adding a delay before starting starting the
              GUI application, but that didn't seem to help.</font></font></div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font><br>
            </font></font></div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font>Do you have any idea of where to look for
              relevant errors? /var/log/messages indicates that the app
              crashed, without any additional information.</font></font></div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font><br>
            </font></font></div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          The nodes are running on CentOS 7. <br>
          <br>
        </div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          Let me know if additional info is needed.<br>
        </div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <br>
        </div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          Cheers,<br>
        </div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font><br>
            </font></font></div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <font style=""><font>Joey Dumont</font></font></div>
        <div name="divtagdefaultwrapper"
          style="font-family:Calibri,Arial,Helvetica,sans-serif;
          font-size:; margin:0">
          <div style=""><br>
          </div>
          <div style="">Technical Advisor, Knowledge, Information, and
            Technology Services</div>
          <div style="">National Research Council Canada / Governement
            of Canada</div>
          <div style=""><a tabindex="0"
              href="mailto:joey.dumont@nrc-cnrc.gc.ca" id="NoLP"
              moz-do-not-send="true">joey.dumont@nrc-cnrc.gc.ca</a> /
            Tel: 613-990-8152 / Cell: 438-340-7436</div>
          <div style=""><br>
          </div>
          <div style="">Conseiller technique, Services du savoir, de
            l'information et de la technologie</div>
          <div style="">Conseil national de recherches Canada /
            Gouvernement du Canada</div>
          <div style=""><a tabindex="0"
              href="mailto:joey.dumont@nrc-cnrc.gc.ca" id="NoLP"
              moz-do-not-send="true">joey.dumont@nrc-cnrc.gc.ca</a> /
            Tél.: 613-990-8152 / Tél. cell.: 438-340-7436</div>
        </div>
      </div>
    </blockquote>
  </body>
</html>