<div dir="ltr">Sorry, you are right, the documentation is clear about it being available only in EpilogSlurmctld. I'm quite new to SLURM and I've read some of the documentation, but obviously I haven't grasped it all. I don't quite understand the difference between --epilog vs --task-epilog, EpilogSlurm vs. EpilogSlurmctld, so this is probably a case of me just doing it wrong!<div><br></div><div>Here's my example use case and how I'm trying to run and clean things up:<br><div><br></div><div><font face="monospace, monospace"> srun --nodes 3 --epilog path/to/<b>cleanup.sh</b> <b>myProgram.exe</b></font></div><div><ul><li><font face="monospace, monospace"><b>myProgram.exe</b></font> reads the SLURM_JOB_NODELIST and spawns "slave processes" on those machines accordingly. It's all done through ssh communication internally within the <a href="https://github.com/nasa/trick/wiki/UserGuide-Monte-Carlo">Trick simulation architecture</a>. I point this out because that means I'm <i>not </i>launching these slave processes with srun or any other SLURM mechanism once Trick "takes control". I simply "ask SLURM" for 3 nodes and Trick handles the distribution of processes based on SLURM_JOB_NODELIST information.<br></li><li><font face="monospace, monospace"><b>cleanup.sh</b></font> is designed to read the same SLURM_JOB_NODELIST and then clean up things on those machines in the event something kills the program early (scancel for example). This way, the epilog script is only cleaning up on the machines that <b><font face="monospace, monospace">myProgram.exe</font></b> used during it's execution.</li></ul><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Calling srun with </span><span style="color:rgb(34,34,34);font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><font face="monospace, monospace">--epilog cleanup.sh</font></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> works just fine, but that environment variable doesn't exist so it can't go clean up the other nodes, and it doesn't appear to be executed once per node, which was another thought I'd had</span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">. Indeed, cleanup.sh appears to run </span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><i>on the machine I launched srun from</i></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">, not any of the machines given to me in my SLURM_JOB_NODELIST!</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">What is the </span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><i>correct </i></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">way to clean up processes across the nodes given to my program by SLURM_JOB_NODELIST?</span></div></div></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Thanks!</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">P.S. I had originally attempted to use sbatch instead of srun, but found that it doesn't support an --epilog switch at all.</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 4, 2018 at 6:01 PM, Christopher Samuel <span dir="ltr"><<a href="mailto:chris@csamuel.org" target="_blank">chris@csamuel.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 05/03/18 10:16, Dan Jordan wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
In my particular case, I need SLURM_JOB_NODELIST, which should be available but it is not.<br>
</blockquote>
<br></span>
This is only available in PrologSlurmctld, not Prolog, according to<br>
those docs. Does that match what you're trying?<br>
<br>
cheers,<br>
Chris<br>
<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Dan Jordan<br></div>
</div>