[slurm-users] lmod and slurm

Yair Yarom irush at cs.huji.ac.il
Tue Dec 19 04:13:34 MST 2017


Hi list,

We use here lmod[1] for some software/version management. There are two
issues encountered (so far):

1. The submission node can have different software than the execution
   nodes - different cpu, different gpu (if any), infiniband, etc. When
   a user runs 'module load something' on the submission node, it will
   pass the wrong environment to the task in the execution
   node. e.g. "module load tensorflow" can load a different version
   depending on the nodes.

2. There are some modules we want to load by default, and again this can
   be different between nodes (we do this by source'ing /etc/lmod/lmodrc
   and ~/.lmodrc).

For issue 1, we instruct users to run the "module load" in their batch
script and not before running sbatch, but issue 2 is more problematic.

My current solution is to write a TaskProlog script that runs "module
purge" and "module load" and export/unset the changed environment
variables. I was wondering if anyone encountered this issue and have a
less cumbersome solution.

Thanks in advance,
    Yair.

[1] https://www.tacc.utexas.edu/research-development/tacc-projects/lmod



More information about the slurm-users mailing list