[slurm-users] Slurm and available libraries

Vanzo, Davide davide.vanzo at vanderbilt.edu
Wed Jan 17 07:26:16 MST 2018


Hi Bill!

Always glad to contribute to the Lmod cause! ;)

Back to the discussion, I simply gave my contribution based on how we set up our system. In no way I intended to say that that is the only way to deploy software. Yours is definitely a valid alternative, although it requires a deeper experience in software packaging and deployment.

To solve the problem of users overloading the login nodes we are experimenting with cgroups, but here we are going a little too much off topic.

PS: Now that I am in San Antonio I have no more excuses to come and visit you guys at TACC.

--
Davide Vanzo, PhD
Application Developer
Adjunct Assistant Professor of Chemical and Biomolecular Engineering
Advanced Computing Center for Research and Education (ACCRE)
Vanderbilt University - Hill Center 201
(615)-875-9137
www.accre.vanderbilt.edu


On 2018-01-17 08:01:10-06:00 slurm-users wrote:

I’d go slightly further, though I do appreciate the Lmod shout-out!: In some cases, you may not even want the software on the frontend nodes (hear me out before I retract it).

If it’s a library that requires linking against before it can be used, then you probably have to have it unless you require users to submit interactive jobs to some dedicated build nodes to do their compilation. You’ll find that when users have all their software needs in one place on the frontend nodes, that sometimes they try to run it there, taking away resources from others. Now, a quick test run to make sure that their build is correct is probably no big deal, but some users will run their full-on science experiments (or pre- and post-processing steps) on the login nodes! We like to encourage those folks to submit jobs to the compute nodes. You could, but they probably wouldn’t like, cripple or not install some libraries on the login nodes to prevent this, but we just watch those systems like a hawk, given that we do want users to be able to build their programs on the login nodes.

We don’t use EB, but we do collaborate with them to make it and Lmod compatible. We use something like OpenHPC to push RPMs we build in-house to manage software on our login and compute nodes. Sometimes, we also just install a binary package (like an ISV code like ANSYS or MATLAB) into a shared filesystem (one of our Lustre filesystems, usually) when making our own RPM is too cumbersome, and then use Lmod to make it available and visible to our users. There are more strategies for this than you can imagine, so settle on a few and keep it simple for you!

Best,
Bill.

--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445



On 1/17/18, 7:48 AM, "slurm-users on behalf of Vanzo, Davide" <slurm-users-bounces at lists.schedmd.com on="" behalf="" of="" davide.vanzo at vanderbilt.edu=""> wrote:

    Ciao Elisabetta,

    I second John's reply.
    On our cluster we install software on the shared parallel filesystem with EasyBuild and use Lmod as a module front-end. Then users will simply load software in the job's environment by using the module command.

    Feel free to ping me directly if you need specific help.


    --
    Davide Vanzo, PhD
    Application Developer
    Adjunct Assistant Professor of Chemical and Biomolecular Engineering
    Advanced Computing Center for Research and Education (ACCRE)
    Vanderbilt University - Hill Center 201
    (615)-875-9137
    https://na01.safelinks.protection.outlook.com/?url=www.accre.vanderbilt.edu&amp;data=02%7C01%7Cdavide.vanzo%40vanderbilt.edu%7Ca55a733721e34284029d08d55db2bfa4%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636517944686221529&amp;sdata=4qU%2FqW28JoTtmWYE9Jyjc1VeKOT7U4aiMQdsjXnAVYg%3D&amp;reserved=0


    On 2018-01-17 07:28:31-06:00 slurm-users wrote:

    Hi,
    let's say I need to execute a python script with slurm. The script require a particular library installed on the system like numpy.
    If the library is not installed to the system, it is necessary to install it on the master AND the nodes, right? This has to be done on each machine separately or there's a way to install it one time for all the machine (master and nodes)?
    Elisabetta







</slurm-users-bounces at lists.schedmd.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180117/ef0aef7e/attachment-0001.html>


More information about the slurm-users mailing list