[slurm-users] Slurm and shared file systems

Alex Chekholko alex at calicolabs.com
Fri Jun 19 17:55:32 UTC 2020


Hi David,

There are several approaches to have a shared filesystem namespace without
an actual shared filesystem. One issue you will have to contend with is how
to handle any kind of filesystem caching (how much room to allocate for
local cache, how to handle cache inconsistencies).

examples:
gcsfuse for fronting GCS buckets https://github.com/SchedMD/slurm-gcp
CVMFS for Open Science Grid

One thing you could ask is how big the datasets are for this particular
research group.  If they are small, maybe they can get away with copying
files around all the time; example slide deck that discusses this:
https://opensciencegrid.org/user-school-2018/materials/day4/files/osgus18-day4-part4-output-shared-fs.pdf

But all of that adds complexity and if it's a local physical cluster it is
easiest to just have shared storage.

Regards,
Alex


On Fri, Jun 19, 2020 at 7:23 AM Brian Andrus <toomuchit at gmail.com> wrote:

> It sounds like you are asking if there should be a shared /home, which you
> do not need. You do need to ensure a user can access the environment for
> the node (a home directory, ssh keys, etc).
>
>
> If you are asking about the job binary and the data it will be processing,
> again, you do not. You could, for example, install the binary on all the
> nodes.
>
> If your job fetches its own data to work on (say a script that will
> download/prep .grib files and then run wrf) then there is no need for a
> shared filesystem.
>
> You will, of course, need to stage the results out somewhere as well to
> access them outside the cluster.
>
>
> Brian Andrus
>
>
> On 6/19/2020 5:04 AM, David Baker wrote:
>
> Hello,
>
> We are currently helping a research group to set up their own Slurm
> cluster. They have asked a very interesting question about Slurm and file
> systems. That is, they are posing the question -- do you need a shared user
> file store on a Slurm cluster?
>
> So, in the extreme case where this is no shared file store for users can
> slurm operate properly over a cluster? I have seen commands like sbcast to
> move a file from the submission node to a compute node, however that
> command can only transfer one file at a time. Furthermore what would happen
> to the standard output files? I'm going to guess that there must be a
> shared file system, however it would be good if someone could please
> confirm this.
>
> Best regards,
> David
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200619/cb317602/attachment-0001.htm>


More information about the slurm-users mailing list