We have several nodes, most of which have different Linux distributions (distro for short). Controller has a different distro as well. The only common thing between controller and all the does is that all of them ar x86_64.
I can install Slurm using package manager on all the machines but this will not work because controller will have a different version of Slurm compared to the nodes (21.08 vs 23.11)
If I build from source then I see two solutions: - build a deb package - build a custom package (./configure, make, make install)
Building a debian package on the controller and then distributing the binaries on nodes won't work either because that binary will start looking for the shared libraries that it was built for and those don't exist on the nodes.
So the only solution I have is to build a static binary using a custom package. Am I correct or is there another solution here?
Hi Arnuld
It is most important to keep the Slurm version the same across the board.
As you are mentioning the "deb" package I am assuming all of your nodes are of a debian-based distribution that should be close enough for each other. However, Debian based distros are not as "binary compatible" as RHEL based distros (Say, RHEL, Alma, Rocky, CentOS, Oracle, Fedora etc.), thus even though they all use "deb" package, it would be better to avoid sharing deb across different distros.
If all of your distros have a similar package version for the dependencies (say, at least on glibc level), except for different way to name a package (e.g. apache2 - httpd), that would potentially allow you to run the same slurm on other distros. In this case, you may work around them by using the DEBIAN/control Depends field to list all of the potential names for each dependency.
Static linking packages or using a conda-like environment may help you more if those distros are more different and require a rebuild per distro. Otherwise, it would probably make more sense to just build them on each and every node based on the feature they need (say, ROCm or nvml makes no sense on a node without such devices).
More complex structure does indeed require more maintenance work. I got quite tired of it and decided to just ship with RHEL-family OS for all computer nodes and let those who are more familiar with whatever distro to start one up with singularity or docker by themselves.
Sincerely,
S. Zhang
2024年5月22日(水) 17:11 Arnuld via slurm-users slurm-users@lists.schedmd.com:
We have several nodes, most of which have different Linux distributions (distro for short). Controller has a different distro as well. The only common thing between controller and all the does is that all of them ar x86_64.
I can install Slurm using package manager on all the machines but this will not work because controller will have a different version of Slurm compared to the nodes (21.08 vs 23.11)
If I build from source then I see two solutions:
- build a deb package
- build a custom package (./configure, make, make install)
Building a debian package on the controller and then distributing the binaries on nodes won't work either because that binary will start looking for the shared libraries that it was built for and those don't exist on the nodes.
So the only solution I have is to build a static binary using a custom package. Am I correct or is there another solution here?
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
Brian Andrus
On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
We have several nodes, most of which have different Linux distributions (distro for short). Controller has a different distro as well. The only common thing between controller and all the does is that all of them ar x86_64.
I can install Slurm using package manager on all the machines but this will not work because controller will have a different version of Slurm compared to the nodes (21.08 vs 23.11)
If I build from source then I see two solutions: - build a deb package - build a custom package (./configure, make, make install)
Building a debian package on the controller and then distributing the binaries on nodes won't work either because that binary will start looking for the shared libraries that it was built for and those don't exist on the nodes.
So the only solution I have is to build a static binary using a custom package. Am I correct or is there another solution here?
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
Oh cool, I will download the latest version 23.11.7 and build debian packages on every machine then
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
I do have non-gpu machines. I guess I need to learn to modify the debian Control files for this
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
you mean the versions of the dependencies are compatible? It is true for most (like munge) but might not be true for others like (yaml or http-parser). I need to check on that.
On Thu, May 23, 2024 at 1:07 AM Brian Andrus via slurm-users < slurm-users@lists.schedmd.com> wrote:
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
Brian Andrus
On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
We have several nodes, most of which have different Linux distributions (distro for short). Controller has a different distro as well. The only common thing between controller and all the does is that all of them ar x86_64.
I can install Slurm using package manager on all the machines but this will not work because controller will have a different version of Slurm compared to the nodes (21.08 vs 23.11)
If I build from source then I see two solutions:
- build a deb package
- build a custom package (./configure, make, make install)
Building a debian package on the controller and then distributing the binaries on nodes won't work either because that binary will start looking for the shared libraries that it was built for and those don't exist on the nodes.
So the only solution I have is to build a static binary using a custom package. Am I correct or is there another solution here?
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Hi Arnuld,
What I would probably do is to build one for each distro and install them either directly into /usr/local or using deb package.
The DEBIAN/control is used by apt to manage a couple of things, such as indexing so apt search shows what this package is for, which package it could replace, which packages are recommended to be installed with it, and which packages need to be installed before this can work.
For those machines with a certain brand of GPU, you would need a slurm that is configured and compiled with such option ON, and such device driver in the DEBIAN/control to allow apt to check the driver on the machine meets the requirement of your deb package. You can forget about the second part if you are not using deb packages and just compile - run the slurm on the client machine.
The last thing he mentioned is about the slurm versions. A slurm client of lower version (say 23.02) should be able to talk to a slurmctld of higher version (say 23.11) just fine, though the reverse do not apply. For dependency management it is of such complexity that maintaining a distribution of Linux is quite some work - I knew it as I am a maintainer of a Linux distro that uses dpkg packages, but without a debian root and uses a different cli tool etc.
In fact I am more worried about how the users would benefit from such a mixture of execution environments - a misstep in configuration or a user submitting job without specifying enough info on what they asks for would probably make the user's job works or does not work purely by chance of which node it got executed, and which environment the job's executables are built against. It probably need a couple of "similar" nodes to allow users benefiting from the job queue to send their job to the place where available.
Good luck with your setup
Sincerely,
S. Zhang
On 2024/05/23 13:04, Arnuld via slurm-users wrote:
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
Oh cool, I will download the latest version 23.11.7 and build debian packages on every machine then
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
I do have non-gpu machines. I guess I need to learn to modify the debian Control files for this
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
you mean the versions of the dependencies are compatible? It is true for most (like munge) but might not be true for others like (yaml or http-parser). I need to check on that.
On Thu, May 23, 2024 at 1:07 AM Brian Andrus via slurm-users slurm-users@lists.schedmd.com wrote:
Not that I recommend it much, but you can build them for each environment and install the ones needed in each. A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them. Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible. Brian Andrus On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote: > We have several nodes, most of which have different Linux > distributions (distro for short). Controller has a different distro as > well. The only common thing between controller and all the does is > that all of them ar x86_64. > > I can install Slurm using package manager on all the machines but this > will not work because controller will have a different version of > Slurm compared to the nodes (21.08 vs 23.11) > > If I build from source then I see two solutions: > - build a deb package > - build a custom package (./configure, make, make install) > > Building a debian package on the controller and then distributing the > binaries on nodes won't work either because that binary will start > looking for the shared libraries that it was built for and those don't > exist on the nodes. > > So the only solution I have is to build a static binary using a custom > package. Am I correct or is there another solution here? > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
n fact I am more worried about how the users would benefit from such a mixture of execution environments ...SNIIP
So what is an ideal setup? Keep the same .deb distro on all machines and use apt to install slurm on every machine?
On Thu, May 23, 2024 at 10:20 AM Shunran Zhang < szhang@ngs.gen-info.osaka-u.ac.jp> wrote:
Hi Arnuld,
What I would probably do is to build one for each distro and install them either directly into /usr/local or using deb package.
The DEBIAN/control is used by apt to manage a couple of things, such as indexing so apt search shows what this package is for, which package it could replace, which packages are recommended to be installed with it, and which packages need to be installed before this can work.
For those machines with a certain brand of GPU, you would need a slurm that is configured and compiled with such option ON, and such device driver in the DEBIAN/control to allow apt to check the driver on the machine meets the requirement of your deb package. You can forget about the second part if you are not using deb packages and just compile - run the slurm on the client machine.
The last thing he mentioned is about the slurm versions. A slurm client of lower version (say 23.02) should be able to talk to a slurmctld of higher version (say 23.11) just fine, though the reverse do not apply. For dependency management it is of such complexity that maintaining a distribution of Linux is quite some work - I knew it as I am a maintainer of a Linux distro that uses dpkg packages, but without a debian root and uses a different cli tool etc.
In fact I am more worried about how the users would benefit from such a mixture of execution environments - a misstep in configuration or a user submitting job without specifying enough info on what they asks for would probably make the user's job works or does not work purely by chance of which node it got executed, and which environment the job's executables are built against. It probably need a couple of "similar" nodes to allow users benefiting from the job queue to send their job to the place where available.
Good luck with your setup
Sincerely,
S. Zhang On 2024/05/23 13:04, Arnuld via slurm-users wrote:
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
Oh cool, I will download the latest version 23.11.7 and build debian packages on every machine then
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
I do have non-gpu machines. I guess I need to learn to modify the debian Control files for this
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
you mean the versions of the dependencies are compatible? It is true for most (like munge) but might not be true for others like (yaml or http-parser). I need to check on that.
On Thu, May 23, 2024 at 1:07 AM Brian Andrus via slurm-users < slurm-users@lists.schedmd.com> wrote:
Not that I recommend it much, but you can build them for each environment and install the ones needed in each.
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
Generally, so long as versions are compatible, they can work together. You will need to be aware of differences for jobs and configs, but it is possible.
Brian Andrus
On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
We have several nodes, most of which have different Linux distributions (distro for short). Controller has a different distro as well. The only common thing between controller and all the does is that all of them ar x86_64.
I can install Slurm using package manager on all the machines but this will not work because controller will have a different version of Slurm compared to the nodes (21.08 vs 23.11)
If I build from source then I see two solutions:
- build a deb package
- build a custom package (./configure, make, make install)
Building a debian package on the controller and then distributing the binaries on nodes won't work either because that binary will start looking for the shared libraries that it was built for and those don't exist on the nodes.
So the only solution I have is to build a static binary using a custom package. Am I correct or is there another solution here?
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On 5/22/24 3:33 pm, Brian Andrus via slurm-users wrote:
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
FWIW we have both GPU and non-GPU nodes but we use the same RPMs we build on both (they all boot the same SLES15 OS image though).
I would guess either you install GPU drivers on the non-GPU nodes or build slurm without GPU support for that to work due to package dependencies.
Both viable options. I have done installs where we just don't compile GPU support in and that is left to the users to manage.
Brian Andrus
On 5/23/2024 6:16 AM, Christopher Samuel via slurm-users wrote:
On 5/22/24 3:33 pm, Brian Andrus via slurm-users wrote:
A simple example is when you have nodes with and without GPUs. You can build slurmd packages without for those nodes and with for the ones that have them.
FWIW we have both GPU and non-GPU nodes but we use the same RPMs we build on both (they all boot the same SLES15 OS image though).