[slurm-users] Staging data on the nodes one will be processing on via sbatch

Fulcomer, Samuel samuel_fulcomer at brown.edu
Sat Apr 3 21:31:17 UTC 2021


inline below...

On Sat, Apr 3, 2021 at 4:50 PM Will Dennis <wdennis at nec-labs.com> wrote:

> Sorry, obvs wasn’t ready to send that last message yet…
>
>
>
> Our issue is the shared storage is via NFS, and the “fast storage in
> limited supply” is only local on each node. Hence the need to copy it over
> from NFS (and then remove it when finished with it.)
>
> I also wanted the copy & remove to be different jobs, because the main
> processing job usually requires GPU gres, which is a time-limited resource
> on the partition. I don’t want to tie up the allocation of GPUs while the
> data is staged (and removed), and if the data copy fails, don’t want to
> even progress to the job where the compute happens (so like,
> copy_data_locally && process_data)
>

...yup... this is the problem. We've invested in GPFS and an NVMe Excelero
pool (for initial placement); however, we still have the problem of having
users pull down data from community repositories before running useful
computation.

Your question has gotten me thinking about this more. In our case, all of
our nodes are diskless, so this wouldn't really work for us (but we do have
fast GPFS), but.... if your fast storage is only local to your nodes, the
subsequent compute jobs will need to request those specific nodes, so
you'll need to have a mechanism to increase the SLURM scheduling  "weight"
of the nodes after staging, so the scheduler won't select them over nodes
with a lower weight. That could be done in a job epilog.




>
> If you've got other fast storage in limited supply that can be used for
> data that can be staged, then by all means use it, but consider whether you
> want batch cpu cores tied up with the wall time of transferring the data.
> This could easily be done on a time-shared frontend login node from which
> the users could then submit (via script) jobs after the data was staged.
> Most of the transfer wallclock is in network wait, so don't waste dedicated
> cores for it.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210403/a2ee5988/attachment.htm>


More information about the slurm-users mailing list