<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>I think this is exactly the type of use case heterogeneous job
support is for, which has been supported since Slurm 17.11</p>
<p>
<blockquote type="cite">Slurm version 17.11 and later supports the
ability to submit and manage
heterogeneous jobs, in which each component has virtually all
job options
available including partition, account and QOS (Quality Of
Service).
For example, part of a job might require four cores and 4 GB for
each of 128
tasks while another part of the job would require 16 GB of
memory and one CPU.</blockquote>
<br>
<a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/heterogeneous_jobs.html">https://slurm.schedmd.com/heterogeneous_jobs.html</a></p>
<p>Using this, you should be able to use a single core for the
transfer from NFS , use all the cores/GPUs you need for the
computation, and then use 1 single core to transfer back to NFS: <br>
</p>
<p>Disclaimer: I've never used this feature myself. <br>
</p>
<pre class="moz-signature" cols="72">Prentice</pre>
<div class="moz-cite-prefix">On 4/3/21 5:31 PM, Fulcomer, Samuel
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAOORAuF_g8ZnipafqFE-73jpfwybd0y86bwm7a2P5xL_mSdYLw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>inline below...</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, Apr 3, 2021 at 4:50
PM Will Dennis <<a href="mailto:wdennis@nec-labs.com"
moz-do-not-send="true">wdennis@nec-labs.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div style="overflow-wrap: break-word;" lang="EN-US">
<div class="gmail-m_7889209934540133168WordSection1">
<p class="MsoNormal">Sorry, obvs wasn’t ready to send
that last message yet…</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Our issue is the shared storage is
via NFS, and the “fast storage in limited supply” is
only local on each node. Hence the need to copy it
over from NFS (and then remove it when finished with
it.)<br>
<br>
I also wanted the copy & remove to be different
jobs, because the main processing job usually requires
GPU gres, which is a time-limited resource on the
partition. I don’t want to tie up the allocation of
GPUs while the data is staged (and removed), and if
the data copy fails, don’t want to even progress to
the job where the compute happens (so like,
copy_data_locally && process_data)</p>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>...yup... this is the problem. We've invested in GPFS and
an NVMe Excelero pool (for initial placement); however, we
still have the problem of having users pull down data from
community repositories before running useful computation.</div>
<div><br>
</div>
<div>Your question has gotten me thinking about this more. In
our case, all of our nodes are diskless, so this wouldn't
really work for us (but we do have fast GPFS), but.... if
your fast storage is only local to your nodes, the
subsequent compute jobs will need to request those specific
nodes, so you'll need to have a mechanism to increase the
SLURM scheduling "weight" of the nodes after staging, so
the scheduler won't select them over nodes with a lower
weight. That could be done in a job epilog.</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div style="overflow-wrap: break-word;" lang="EN-US">
<div class="gmail-m_7889209934540133168WordSection1">
<p class="MsoNormal"> </p>
<div>
<blockquote
style="border-top:none;border-right:none;border-bottom:none;border-left:1pt
solid rgb(204,204,204);padding:0in 0in 0in
6pt;margin:5pt 0in 5pt 4.8pt">
<div>
<div>
<div>
<div>
<p class="MsoNormal"
style="margin-right:0in;margin-bottom:5pt;margin-left:0in">
<span style="color:black">If you've got
other fast storage in limited supply
that can be used for data that can be
staged, then by all means use it, but
consider whether you want batch cpu
cores tied up with the wall time of
transferring the data. This could easily
be done on a time-shared frontend login
node from which the users could then
submit (via script) jobs after the data
was staged. Most of the transfer
wallclock is in network wait, so don't
waste dedicated cores for it.</span></p>
<p class="MsoNormal"
style="margin-right:0in;margin-bottom:5pt;margin-left:0in">
<span
style="font-size:13.5pt;font-family:-webkit-standard,serif;color:black"> </span></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</body>
</html>