[slurm-users] Kinda Off-Topic: data management for Slurm clusters

Ansgar Esztermann-Kirchner aeszter at mpibpc.mpg.de
Tue Feb 26 08:36:28 UTC 2019


I'd like to share our set-up as well, even though it's very
specialized and thus probably won't work in most places. However, it's
also very efficient in terms of budget when it does.

Our users don't usually have shared data sets, so we don't need high
bandwidth at any particular point -- the aggregate bandwidth is what's
All our users' computers are equipped with large disks (typically
4x10TB on new installations), formatted as mdraid 5 (no additional
cost for a controller, plus almost unlimited recovery options in case
of multi-disk failure) with XFS[1]. So, in effect, every user has
their own NFS server. These have a VLAN interface into the internal cluster 
network, and the cluster nodes mount the user homes via autofs.

This provides sufficient bandwidth in most cases, but we have local
scratch as well (just a smallish SSD per node that's also used for
stateful provisioning), and some users resort to that since copying
everythin in one go up front is usually faster than accessing a lot of
files via NFS.


[1] Back when I tested it in 2006, XFS provided by far the best
performance for our application: multiple nodes appending to large
files via NFS. That's actually not that surprising given that XFS was
developed with video processing in mind.
Ansgar Esztermann
Sysadmin Dep. Theoretical and Computational Biophysics
