Greetings !
We're using Slurm 23.11.8 in a small cluster . The control node shares via NFS the directory /clusterprograms with the compute nodes (using the same name as mountpoint) to provide access to available software. The users are instructed to use lmodhttps://lmod.readthedocs.io/en/latest/060_locating.html (module load/avail/purge) to setup their job's runtime environment. We have the software and modules at /clusterprograms distributed like this:
/clusterprograms/ ├── common ├── gpu/ │ ├── cuda8/ │ └── cuda10/ └── cpu/ ├── intel/ └── amd/
We configured the /etc/profile.d/ directory on each node so ,when we access them directly (via ssh) the paths used by module (MODULEPATH) are only these:
* /home/<user>/modulefiles * /clusterprograms/common/modules * /clusterprograms/<processor vendor or cuda version>/modules
The exception is the control/login node, which only has /home/<user>/modulefiles and /clusterprograms/common/modules
The issue we have now is this: We are looking for a safe and practical way to automate the reconfiguration of MODULEPATHhttps://lmod.readthedocs.io/en/latest/077_ref_counting.html in the compute nodes when a job is submitted. For now, the only method that is "working" is the following TaskProlog script, but it forces the users to load the modules inside the scripts that may be called by srun steps, rather than being able to load them in the job base step:
#!/bin/bash if [[ -n "${SLURM_JOB_CONSTRAINTS}" ]]; then echo "export ORIGINAL_MPATH=${MODULEPATH}" IFS=',' read -ra JOB_ARCH_CONSTRAINTS <<< "$SLURM_JOB_CONSTRAINTS" for constraint in "${JOB_ARCH_CONSTRAINTS[@]}"; do
case "${constraint}" in cuda*) echo "export MODULEPATH=/clusterprograms/gpu/${constraint}:${MODULEPATH}" ;; intel-*) echo "export MODULEPATH=/clusterprograms/cpu/intel/${constraint}:${MODULEPATH}" ;; amd-*) echo "export MODULEPATH=/clusterprograms/cpu/amd/${constraint}:${MODULEPATH}" ;; esac done
fi
Is there any better way to do this ? How is that it doesn't work if we use Prolog ? I tried using the --export option but it doesn't work, since the control node has MODULEPATH set via the main shell profiles.
Many thanks !!!
--
Daniel Garrapucho Lévy
Técnic informàtic
Departament de Física de la Matèria Condensada Facultat de Física
Martí i Franquès, 1 08028 Barcelona, SPAIN Despatx V302 Email: daniel.garrapucho@ub.edumailto:daniel.garrapucho@ub.edu
[https://estatics.web.ub.edu/image/company_logo?img_id=2946262&t=17001439...]
Aquest missatge, i els fitxers adjunts que hi pugui haver, pot contenir informació confidencial o protegida legalment i s’adreça exclusivament a la persona o entitat destinatària. Si no consteu com a destinatari final o no teniu l’encàrrec de rebre’l, no esteu autoritzat a llegir-lo, retenir-lo, modificar-lo, distribuir-lo, copiar-lo ni a revelar-ne el contingut. Si l’heu rebut per error, informeu-ne el remitent i elimineu del sistema tant el missatge com els fitxers adjunts que hi pugui haver.
Este mensaje, y los ficheros adjuntos que pueda incluir, puede contener información confidencial o legalmente protegida y está exclusivamente dirigido a la persona o entidad destinataria. Si usted no consta como destinatario final ni es la persona encargada de recibirlo, no está autorizado a leerlo, retenerlo, modificarlo, distribuirlo o copiarlo, ni a revelar su contenido. Si lo ha recibido por error, informe de ello al remitente y elimine del sistema tanto el mensaje como los ficheros adjuntos que pueda contener.
This email message and any attachments it carries may contain confidential or legally protected material and are intended solely for the individual or organization to whom they are addressed. If you are not the intended recipient of this message or the person responsible for processing it, then you are not authorized to read, save, modify, send, copy or disclose any part of it. If you have received the message by mistake, please inform the sender of this and eliminate the message and any attachments it carries from your account.