[slurm-users] Python and R installation in a SLURM cluster

Eric F. Alemany ealemany at stanford.edu
Fri May 11 22:30:01 MDT 2018


Hi John,

Regarding NFS shares and Python, and plenty of other packages too,
pay attention to where the NFS server is located on your network.
The NFS server should be part of your cluster, or at least have a network interface on your cluster fabric.

If you perhaps have a home directory server which is a campus NFS server and you are NATting via your head node,
then every time a parallel multimode job starts up you will pull in libraries multiple times and this will be a real
performance bottleneck.
The NFS server is part of the cluster (same IP subnet/vlan, I know that in the networking world it is a wrong assumption but the NFS server is physically in the same rack server as the cluster.
NFS server / headnode: inet 10.112.0.25  netmask 255.255.255.192  broadcast 10.112.0.63
Execute nodes (1 example):inet 10.112.0.5  netmask 255.255.255.192  broadcast 10.112.0.63



You do have to have a home directory mounted on the nodes - either the users real home directory or something
which loosk like a home directory.  Ooodles of software packages depend on dor files int eh home directory,
and you won't get far without one.
Right now each node has a user home directory.
Do you suggest that i should move / create users' home directory to the NFS share?

Eric, my advice would be to definitely learn the Modules system and implement modules for your users.
I  definitely have to learn more about Modules system and their implementation. My work takes more into that direction.

Also if you could give us some idea of your storage layout this would be good.
I hope this what you meant

Headnode:
eric at radoncmaster:/$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            7.8G     0  7.8G   0% /dev
tmpfs           1.6G  740K  1.6G   1% /run
/dev/sda1       902G  3.3G  853G   1% /
tmpfs           7.9G     0  7.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/sdb1       3.6T  572M  3.4T   1% /media/cluster
tmpfs           1.6G     0  1.6G   0% /run/user/1000

Execute node (1 example)
eric at radonc01:~$ df -h
Filesystem                     Size  Used Avail Use% Mounted on
udev                            32G     0   32G   0% /dev
tmpfs                          6.3G  984K  6.3G   1% /run
/dev/mapper/radonc01--vg-root  2.7T  2.5G  2.5T   1% /
tmpfs                           32G     0   32G   0% /dev/shm
tmpfs                          5.0M     0  5.0M   0% /run/lock
tmpfs                           32G     0   32G   0% /sys/fs/cgroup
/dev/sda2                      473M  128M  321M  29% /boot
10.112.0.25:/media/cluster     3.6T  571M  3.4T   1% /nfs/cluster
tmpfs                          6.3G     0  6.3G   0% /run/user/1000




_____________________________________________________________________________________________________

Eric F.  Alemany
System Administrator for Research

Division of Radiation & Cancer  Biology
Department of Radiation Oncology

Stanford University School of Medicine
Stanford, California 94305

Tel:1-650-498-7969<tel:1-650-498-7969>  No Texting
Fax:1-650-723-7382<tel:1-650-723-7382>



On May 11, 2018, at 12:11 AM, John Hearns <hearnsj at googlemail.com<mailto:hearnsj at googlemail.com>> wrote:

Regarding NFS shares and Python, and plenty of other packages too,
pay attention to where the NFS server is located on your network.
The NFS server should be part of your cluster, or at least have a network interface on your cluster fabric.

If you perhaps have a home directory server which is a campus NFS server and you are NATting via your head node,
then every time a parallel multimode job starts up you will pull in libraries multiple times and this will be a real
performance bottleneck.

You do have to have a home directory mounted on the nodes - either the users real home directory or something
which loosk like a home directory.  Ooodles of software packages depend on dor files int eh home directory,
and you won't get far without one.

Eric, my advice would be to definitely learn the Modules system and implement modules for your users.
Also if you could give us some idea of your storage layout this would be good.














On 11 May 2018 at 08:55, Miguel Gutiérrez Páez <mgutierrez at gmail.com<mailto:mgutierrez at gmail.com>> wrote:
Hi,

I install all my apps in a shared storage, and change environment variables (path, vars, etc.) with lmod. It's very useful.

Regards.

El vie., 11 may. 2018 a las 6:19, Eric F. Alemany (<ealemany at stanford.edu<mailto:ealemany at stanford.edu>>) escribió:
Hi Lachlan,

Thank you for sharing your environment. Everyone has their own set of rules and i appreciate everyone’s input.
It seems as if the NFS share is a great place to start.

Best,
Eric
_____________________________________________________________________________________________________

Eric F.  Alemany
System Administrator for Research

Division of Radiation & Cancer  Biology
Department of Radiation Oncology

Stanford University School of Medicine
Stanford, California 94305

Tel:1-650-498-7969<tel:1-650-498-7969>  No Texting
Fax:1-650-723-7382<tel:1-650-723-7382>



On May 10, 2018, at 4:23 PM, Lachlan Musicman <datakid at gmail.com<mailto:datakid at gmail.com>> wrote:

On 11 May 2018 at 01:35, Eric F. Alemany <ealemany at stanford.edu<mailto:ealemany at stanford.edu>> wrote:
Hi All,

I know this might sounds as a very basic question: where in the cluster should I install Python and R?
Headnode?
Execute nodes ?

And is there a particular directory (path) I need to install Python and R.

Background:
SLURM on Ubuntu 18.04
1 headnode
4 execute nodes
NFS shared drive among all nodes.


Eric,

To echo the others: we have a /binaries nfs share that utilises the standard Environment Modules software so that researchers can manipulate their $PATH on the fly with module load/module unload. That share is mounted on all the nodes.

For Python, I use virtualenv's but instead of activating, the path is changed by the Module file. Personally, I find conda doesn't work very well in a shared environment. It's fine on a personal level/

For R, we have resorted to only installing the main point release because we have >700 libraries installed within R and I don't want to reinstall them every time. We do also have packrat installed so researchers can install their own libraries locally as well.


Cheers
L.






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180512/f799f123/attachment-0001.html>


More information about the slurm-users mailing list