[slurm-users] Exclude Slurm packages from the EPEL yum repository

Andy Riebs andy at candooz.com
Mon Jan 25 15:25:08 UTC 2021


See below

On 1/25/2021 9:36 AM, Ole Holm Nielsen wrote:
> On 1/25/21 2:59 PM, Andy Riebs wrote:
>> Several things to keep in mind...
>>
>>  1. Slurm, as a product undergoing frequent, incompatible revisions, is
>>     not well-suited for provisioning from a stable public repository! On
>>     the other hand, it's just not that hard to build a stable version
>>     where you can directly control the upgrade timing.
>
> I agree that Slurm is probably not well suited for a public repository 
> because of the special care that *must* be taken when upgrading 
> between major versions.
>
> When I use both EPEL for a lot of nice software (Munge, Lmod, ...), 
> AND I build my own Slurm RPMs, now suddenly slurm RPMs from EPEL 
> upsets this stable scenario.
>
Interesting; this does sound like a problem.

>>  2. If you really want a closely managed source for your Slurm RPMs, get
>>     them from the SchedMD website.
>
> All of us get the Slurm source from the SchedMD website.  And all of 
> us have to build our own RPMs from that source (a simple one-liner).  
> SchedMD doesn't provide any RPMs.
Yeah, I was a little embarrassed after sending the previous note when I 
checked to see what they have on the site. I would have sworn that they 
once provided RPMs, as well, but I could be mistaken. (With multiple 
versions of Slurm for multiple versions of RedHat/Fedora, SLES, 
Debian/Ubuntu/..., this would have become a very difficult problem to 
handle.)
>
>>  3. "You could have solicited advice..." -- while this is certainly 
>> true,
>>     for many of us in the open source world, the standard is "release
>>     something quickly, and then improve it, based in part on feedback,
>>     over time."
>
> I don't think this trial-and-error-like approach is suitable for 
> Slurm. We're running production HPC clusters that need to stay very 
> stable.
>
>>  4. Slurm packages (and other contributions, including suggestions on 
>> this
>>     mailing list) that haven't been provided by SchedMD have probably 
>> been
>>     provisioned and tested by a volunteer -- be sure to keep the
>>     conversation civil!
>
> We all have to build our own Slurm RPMs, and we should not get them 
> from a volunteer. IMHO, building Slurm RPMs is very simple. It's the 
> deployment and upgrading which is the hard part of the equation.
> I think my points quoted below deserve careful consideration by the 
> EPEL volunteer, because the results could be potentially harmful.

I think you've raised some good points, but keep in mind that you're in 
a community of thousands of people with multiple diverging requirements 
who are taking advantage of free software. Moe and Danny set up SchedMD 
precisely for users who need more consistent, reliable, and specific 
support than might be available for free.

I'm serious about suggesting a contract with SchedMD. Though I've been 
working with Slurm for nearly 20 years, I've always enjoyed the 
technical challenges, and have managed to avoid needing a contract for 
my own work. Though I'll confess, there have been several times that 
I've taken advantage of the fact that my customer du jour had both a 
problem and a contract with SchedMD. In those cases, SchedMD was always 
responsive and helpful.

Regards,
Andy


>
> Thanks,
> Ole
>
>> Andy Riebs
>>
>> On 1/25/2021 2:47 AM, Ole Holm Nielsen wrote:
>>> On 1/23/21 9:43 PM, Philip Kovacs wrote:
>>>> I can assure you it was easier for you to filter slurm from your 
>>>> repos than it was for me to make them available to both epel7 and 
>>>> epel8.
>>>>
>>>> No good deed goes unpunished I guess.
>>>
>>> I do sympathize with your desire to make the Slurm installation a 
>>> bit easier by providing RPMs via the EPEL repo.  I do not 
>>> underestimate the amount of work it takes to add software to EPEL.
>>>
>>> However, I have several issues with your approach:
>>>
>>> 1. Breaking existing Slurm installations could cause big time 
>>> problems at a lot of sites!  The combined work to repair broken 
>>> installations at many sites might be substantial. Sites who are more 
>>> than two releases behind 20.11 could end up with dysfunctional 
>>> clusters.  You are undoubtedly aware that 20.11.3 fixes a major 
>>> problem in 20.11.2 wrt. OpenMPI, so the upgrade from 20.02 to 
>>> 20.11.2 may cause problems.
>>>
>>> 2. Your EPEL RPMs *must not* upgrade between major Slurm releases, 
>>> like the 20.02 to 20.11 upgrade that almost happened at our site! I 
>>> refer again to the delicate upgrade procedure described in 
>>> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
>>>
>>> 3. You could have solicited advice from the slurm-users list before 
>>> planning your EPEL Slurm packages.
>>>
>>> 4. How do you plan to keep updating future Slurm minor versions on 
>>> EPEL in a timely fashion?
>>>
>>> 5. How did you build your RPM packages?  The built-in options may be 
>>> important, for example, this might be recommended:
>>> $ rpmbuild -ta slurm-xxx.tar.bz2 --with mysql --with slurmrestd
>>>
>>> 6. Building Slurm RPM packages is actually a tiny part of what it 
>>> takes to install Slurm from scratch.  There are quite a number of 
>>> prerequisites and other things to set up besides the RPMs, see
>>> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
>>> plus configuration of Slurm itself and its database.
>>>
>>> In conclusion, I would urge you to ensure that your EPEL packages 
>>> won't mess up existing Slurm installations!  I agree with Ryan 
>>> Novosielski that you should rename your RPMs so that they don't 
>>> overwrite packages built by SchedMD's rpmbuild system.
>>>
>>> I propose that you add the major version 20.11 right after the 
>>> "slurm" name so that your EPEL RPMs would be named "slurm-20.11-*" 
>>> like in:
>>>
>>> slurm-20.11-20.11.2-2.el7.x86_64
>>>
>>> People with more knowledge of RPM than I have could help you ensure 
>>> that no unwarranted upgrades or double Slurm installations can take 
>>> place.
>>>
>>> Thanks,
>>> Ole
>>>
>>>
>>>> On Saturday, January 23, 2021, 07:03:08 AM EST, Ole Holm Nielsen 
>>>> <ole.h.nielsen at fysik.dtu.dk> wrote:
>>>>
>>>>
>>>> We use the EPEL yum repository on our CentOS 7 nodes.  Today EPEL
>>>> surprisingly delivers Slurm 20.11.2 RPMs, and the daily yum updates
>>>> (luckily) fail with some errors:
>>>>
>>>> --> Running transaction check
>>>> ---> Package slurm.x86_64 0:20.02.6-1.el7 will be updated
>>>> --> Processing Dependency: slurm(x86-64) = 20.02.6-1.el7 for package:
>>>> slurm-libpmi-20.02.6-1.el7.x86_64
>>>> --> Processing Dependency: libslurmfull.so()(64bit) for package:
>>>> slurm-libpmi-20.02.6-1.el7.x86_64
>>>> ---> Package slurm.x86_64 0:20.11.2-2.el7 will be an update
>>>> --> Processing Dependency: pmix for package: 
>>>> slurm-20.11.2-2.el7.x86_64
>>>> --> Processing Dependency: libfreeipmi.so.17()(64bit) for package:
>>>> slurm-20.11.2-2.el7.x86_64
>>>> --> Processing Dependency: libipmimonitoring.so.6()(64bit) for 
>>>> package:
>>>> slurm-20.11.2-2.el7.x86_64
>>>> --> Processing Dependency: libslurmfull-20.11.2.so()(64bit) for 
>>>> package:
>>>> slurm-20.11.2-2.el7.x86_64
>>>> ---> Package slurm-contribs.x86_64 0:20.02.6-1.el7 will be updated
>>>> ---> Package slurm-contribs.x86_64 0:20.11.2-2.el7 will be an update
>>>> ---> Package slurm-devel.x86_64 0:20.02.6-1.el7 will be updated
>>>> ---> Package slurm-devel.x86_64 0:20.11.2-2.el7 will be an update
>>>> ---> Package slurm-perlapi.x86_64 0:20.02.6-1.el7 will be updated
>>>> ---> Package slurm-perlapi.x86_64 0:20.11.2-2.el7 will be an update
>>>> ---> Package slurm-slurmdbd.x86_64 0:20.02.6-1.el7 will be updated
>>>> ---> Package slurm-slurmdbd.x86_64 0:20.11.2-2.el7 will be an update
>>>> --> Running transaction check
>>>> ---> Package freeipmi.x86_64 0:1.5.7-3.el7 will be installed
>>>> ---> Package pmix.x86_64 0:1.1.3-1.el7 will be installed
>>>> ---> Package slurm.x86_64 0:20.02.6-1.el7 will be updated
>>>> --> Processing Dependency: slurm(x86-64) = 20.02.6-1.el7 for package:
>>>> slurm-libpmi-20.02.6-1.el7.x86_64
>>>> --> Processing Dependency: libslurmfull.so()(64bit) for package:
>>>> slurm-libpmi-20.02.6-1.el7.x86_64
>>>> ---> Package slurm-libs.x86_64 0:20.11.2-2.el7 will be installed
>>>> --> Finished Dependency Resolution
>>>> Error: Package: slurm-libpmi-20.02.6-1.el7.x86_64
>>>> (@/slurm-libpmi-20.02.6-1.el7.x86_64)
>>>>              Requires: libslurmfull.so()(64bit)
>>>>              Removing: slurm-20.02.6-1.el7.x86_64
>>>> (@/slurm-20.02.6-1.el7.x86_64)
>>>>                  libslurmfull.so()(64bit)
>>>>              Updated By: slurm-20.11.2-2.el7.x86_64 (epel)
>>>>                  Not found
>>>> Error: Package: slurm-libpmi-20.02.6-1.el7.x86_64
>>>> (@/slurm-libpmi-20.02.6-1.el7.x86_64)
>>>>              Requires: slurm(x86-64) = 20.02.6-1.el7
>>>>              Removing: slurm-20.02.6-1.el7.x86_64
>>>> (@/slurm-20.02.6-1.el7.x86_64)
>>>>                  slurm(x86-64) = 20.02.6-1.el7
>>>>              Updated By: slurm-20.11.2-2.el7.x86_64 (epel)
>>>>                  slurm(x86-64) = 20.11.2-2.el7
>>>>    You could try using --skip-broken to work around the problem
>>>>    You could try running: rpm -Va --nofiles --nodigest
>>>>
>>>>
>>>> We still run Slurm 20.02 and don't want EPEL to introduce any Slurm
>>>> updates!!  Slurm must be upgraded with some care, see for example
>>>> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm 
>>>> <https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm>
>>>>
>>>> Therefore we must disable EPEL's slurm RPMs permanently. The fix is to
>>>> add to the file /etc/yum.repos.d/epel.repo an "exclude=slurm*" line 
>>>> like
>>>> the last line in:
>>>>
>>>> [epel]
>>>> name=Extra Packages for Enterprise Linux 7 - $basearch
>>>> #baseurl=http://download.fedoraproject.org/pub/epel/7/$basearch 
>>>> <http://download.fedoraproject.org/pub/epel/7/$basearch>
>>>> metalink=https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir 
>>>> <https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir> 
>>>>
>>>> failovermethod=priority
>>>> enabled=1
>>>> gpgcheck=1
>>>> gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
>>>> exclude=slurm*
>>>>
>>>> /Ole
>



More information about the slurm-users mailing list