n fact I am more worried about how the users would benefit from such a mixture of execution environments ...SNIIP
So what is an ideal setup? Keep the same .deb distro on all machines and use apt to install slurm on every machine?
On Thu, May 23, 2024 at 10:20 AM Shunran Zhang < szhang@ngs.gen-info.osaka-u.ac.jp> wrote:
Hi Arnuld,
What I would probably do is to build one for each distro and install them either directly into /usr/local or using deb package.
The DEBIAN/control is used by apt to manage a couple of things, such as indexing so apt search shows what this package is for, which package it could replace, which packages are recommended to be installed with it, and which packages need to be installed before this can work.
For those machines with a certain brand of GPU, you would need a slurm that is configured and compiled with such option ON, and such device driver in the DEBIAN/control to allow apt to check the driver on the machine meets the requirement of your deb package. You can forget about the second part if you are not using deb packages and just compile - run the slurm on the client machine.
The last thing he mentioned is about the slurm versions. A slurm client of lower version (say 23.02) should be able to talk to a slurmctld of higher version (say 23.11) just fine, though the reverse do not apply. For dependency management it is of such complexity that maintaining a distribution of Linux is quite some work - I knew it as I am a maintainer of a Linux distro that uses dpkg packages, but without a debian root and uses a different cli tool etc.
In fact I am more worried about how the users would benefit from such a mixture of execution environments - a misstep in configuration or a user submitting job without specifying enough info on what they asks for would probably make the user's job works or does not work purely by chance of which node it got executed, and which environment the job's executables are built against. It probably need a couple of "similar" nodes to allow users benefiting from the job queue to send their job to the place where available.
Good luck with your setup
Sincerely,
S. Zhang On 2024/05/23 13:04, Arnuld via slurm-users wrote:
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
Oh cool, I will download the latest version 23.11.7 and build debian packages on every machine then
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
I do have non-gpu machines. I guess I need to learn to modify the debian Control files for this
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
you mean the versions of the dependencies are compatible? It is true for most (like munge) but might not be true for others like (yaml or http-parser). I need to check on that.
On Thu, May 23, 2024 at 1:07 AM Brian Andrus via slurm-users < slurm-users@lists.schedmd.com> wrote:
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
Brian Andrus
On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
We have several nodes, most of which have different Linux distributions (distro for short). Controller has a different distro as well. The only common thing between controller and all the does is that all of them ar x86_64.
I can install Slurm using package manager on all the machines but this will not work because controller will have a different version of Slurm compared to the nodes (21.08 vs 23.11)
If I build from source then I see two solutions:
- build a deb package
- build a custom package (./configure, make, make install)
Building a debian package on the controller and then distributing the binaries on nodes won't work either because that binary will start looking for the shared libraries that it was built for and those don't exist on the nodes.
So the only solution I have is to build a static binary using a custom package. Am I correct or is there another solution here?
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com