Hi Arnuld
It is most important to keep the Slurm version the same across the board.
As you are mentioning the "deb" package I am assuming all of your nodes are of a debian-based distribution that should be close enough for each other. However, Debian based distros are not as "binary compatible" as RHEL based distros (Say, RHEL, Alma, Rocky, CentOS, Oracle, Fedora etc.), thus even though they all use "deb" package, it would be better to avoid sharing deb across different distros.
If all of your distros have a similar package version for the dependencies (say, at least on glibc level), except for different way to name a package (e.g. apache2 - httpd), that would potentially allow you to run the same slurm on other distros. In this case, you may work around them by using the DEBIAN/control Depends field to list all of the potential names for each dependency.
Static linking packages or using a conda-like environment may help you more if those distros are more different and require a rebuild per distro. Otherwise, it would probably make more sense to just build them on each and every node based on the feature they need (say, ROCm or nvml makes no sense on a node without such devices).
More complex structure does indeed require more maintenance work. I got quite tired of it and decided to just ship with RHEL-family OS for all computer nodes and let those who are more familiar with whatever distro to start one up with singularity or docker by themselves.
Sincerely,
S. Zhang
2024年5月22日(水) 17:11 Arnuld via slurm-users slurm-users@lists.schedmd.com:
We have several nodes, most of which have different Linux distributions (distro for short). Controller has a different distro as well. The only common thing between controller and all the does is that all of them ar x86_64.
I can install Slurm using package manager on all the machines but this will not work because controller will have a different version of Slurm compared to the nodes (21.08 vs 23.11)
If I build from source then I see two solutions:
- build a deb package
- build a custom package (./configure, make, make install)
Building a debian package on the controller and then distributing the binaries on nodes won't work either because that binary will start looking for the shared libraries that it was built for and those don't exist on the nodes.
So the only solution I have is to build a static binary using a custom package. Am I correct or is there another solution here?
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com