[slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64
Brian Andrus
toomuchit at gmail.com
Fri Aug 16 15:02:33 UTC 2019
Ah. I suspect your issue may be the cuda. 10.1 which does not
create/register all the appropriate symlinks and "provides".
I ran into that trying to install tensorflow.
If you can, downgrade to 10.0, which does a better job of installing itself.
Brian
On 8/16/2019 5:47 AM, Lou Nicotra wrote:
> Brian, the package is being built and installed on the master server.
> I am testing by removing all instances of V18 and installing the newly
> created V19 slurm rpms, I get the error message on the slurm rpm
> install, all others (ctl, db, ... ) install fine.
>
> After I get the error message, I remove all rpms from V19 and
> reinstall V18 using the same procedure with no issues... And the
> system sees all nodes as it did before trying to install V19
>
> The nvidia libraries are installed via the official Nvidia
> rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
> supporting cuda10. Multi GPU server currently used by multiple users
> (DNN training) with no errors of any type while utilizing the nvidia
> libs/code.
>
> nvidia-smi command shows: NVIDIA-SMI 418.39 Driver Version:
> 418.39 CUDA Version: 10.1
>
> So, it is definitely something new to the V19 release... I have
> installed 18.08.0, .3, .4 and .8 on the same server and nodes since
> Sep of 2018 using the same procedures and never had any issues...
> Currently running 18.08.8
>
> Thanks.
> Lou
>
> On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus <toomuchit at gmail.com
> <mailto:toomuchit at gmail.com>> wrote:
>
> Lou,
>
> Are you installing on the same machine you built?
>
> Are the nvidia libraries installed by RPM or a 'make install' on
> the box you compiled it on?
>
> Brian Andrus
>
> On 8/15/2019 7:53 AM, Lou Nicotra wrote:
>> I have tried running ldconfig manually as suggested with
>> slurm-19.05.1-2 and it fails the same way...
>> error: Failed dependencies:
>> libnvidia-ml.so.1()(64bit) is needed by
>> slurm-19.05.1-2.el7.centos.x86_64
>>
>> ldconfig -p shows:
>> root at panther02 slurm# ldconfig -p|grep libnvidia-ml.
>> libnvidia-ml.so.1 (libc6,x86-64) =>
>> /usr/lib64/libnvidia-ml.so.1
>> libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
>> libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
>> libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
>>
>> Just tried the latest release slurm-19.05.2 and it fails in the
>> same way...
>> root at panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
>> error: Failed dependencies:
>> libnvidia-ml.so.1()(64bit) is needed by
>> slurm-19.05.2-1.el7.centos.x86_64
>>
>> Reinstalled slurm-18.08.8 and it installs with no issues... Just
>> like slurm-18.08.03 and slurm-18.08.4 did... All built on the
>> same machine with rpmbuild -ta command...
>> root at panther02 slurm-18.08.8# rpm -Uvh
>> slurm-18.08.8-1.el7.centos.x86_64.rpm
>> Preparing... ################################# [100%]
>> Updating / installing...
>> 1:slurm-18.08.8-1.el7.centos #################################
>> [100%]
>>
>> Oh, well...
>>
>> Lou
>>
>>
>>
>> On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec
>> <barbara.krasovec at ijs.si <mailto:barbara.krasovec at ijs.si>> wrote:
>>
>> What if you try to run ldconfig manually before building the rpm?
>>
>> Cheers,
>>
>> Barbara
>>
>> On 8/8/19 5:57 PM, Lou Nicotra wrote:
>>> I am running into an error while trying to
>>> install slurm-19.05.1-2.el7.centos.x86_64... Error is as
>>> follows:
>>> root at panther02 x86_64# rpm -Uvh
>>> slurm-19.05.1-2.el7.centos.x86_64.rpm
>>> error: Failed dependencies:
>>> libnvidia-ml.so.1()(64bit) is needed by
>>> slurm-19.05.1-2.el7.centos.x86_64
>>>
>>> Packages are built using rpmbuild... And complete with no
>>> errors...
>>> + cd /root/rpmbuild/BUILD
>>> + cd slurm-19.05.1-2
>>> + rm -rf
>>> /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
>>> + exit 0
>>>
>>> Investigation of the output while building the rpm package
>>> shows that nvidia-ml is found:
>>> checking for nvmlInit in -lnvidia-ml... yes
>>> .
>>> .
>>> libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../..
>>> -I../../../../slurm -I../../../.. -I../../../../src/common
>>> -I/usr/local/cuda/include -I/usr/cuda/include
>>> -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
>>> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>> -fstack-protector-strong --param=ssp-buffer-size=4
>>> -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
>>> -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -fPIC -DPIC
>>> -o .libs/gpu_nvml.o
>>> libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../..
>>> -I../../../../slurm -I../../../.. -I../../../../src/common
>>> -I/usr/local/cuda/include -I/usr/cuda/include
>>> -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
>>> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>> -fstack-protector-strong --param=ssp-buffer-size=4
>>> -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
>>> -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -o
>>> gpu_nvml.o >/dev/null 2>&1
>>> /bin/sh ../../../../libtool --tag=CC --mode=link gcc
>>> -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
>>> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>> -fstack-protector-strong --param=ssp-buffer-size=4
>>> -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
>>> -Wall -g -O1 -fno-strict-aliasing -module -avoid-version
>>> --export-dynamic -Wl,-z,relro -o gpu_nvml.la
>>> <http://gpu_nvml.la> -rpath /usr/lib64/slurm gpu_nvml.lo
>>> -lnvidia-ml
>>> libtool: link: gcc -shared -fPIC -DPIC .libs/gpu_nvml.o
>>> -lnvidia-ml -O2 -g -fstack-protector-strong
>>> -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3 -g
>>> -O1 -Wl,-z -Wl,relro -pthread -Wl,-soname -Wl,gpu_nvml.so
>>> -o .libs/gpu_nvml.so
>>>
>>> The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
>>> includes: NVML_LIBS = -lnvidia-ml
>>> but previous releases did not (slurm-18.08.8) And I was able
>>> to compile and install that release with no issues after
>>> building it with rpmbuild...
>>>
>>> My LD_LIBRARY_PATH is
>>> /usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:
>>>
>>> Can anyone provide suggestions on working out this issue?
>>>
>>> Thanks.
>>> --
>>>
>>> LOU NICOTRA
>>>
>>> IT Systems Engineer - SLT
>>>
>>> Interactions LLC
>>>
>>> o: 908-673-1833 <tel:781-405-5114>
>>>
>>> m: 908-451-6983 <tel:781-405-5114>
>>>
>>> _lnicotra at interactions.com <mailto:lnicotra at interactions.com>_
>>>
>>> www.interactions.com <http://www.interactions.com/>
>>>
>>> *******************************************************************************
>>>
>>> This e-mail and any of its attachments may contain
>>> Interactions LLC proprietary information, which is
>>> privileged, confidential, or subject to copyright belonging
>>> to the Interactions LLC. This e-mail is intended solely for
>>> the use of the individual or entity to which it is
>>> addressed. If you are not the intended recipient of this
>>> e-mail, you are hereby notified that any dissemination,
>>> distribution, copying, or action taken in relation to the
>>> contents of and attachments to this e-mail is strictly
>>> prohibited and may be unlawful. If you have received this
>>> e-mail in error, please notify the sender immediately and
>>> permanently delete the original and any copy of this e-mail
>>> and any printout. Thank You.
>>>
>>> *******************************************************************************
>>>
>>
>>
>> --
>>
>> LOU NICOTRA
>>
>> IT Systems Engineer - SLT
>>
>> Interactions LLC
>>
>> o: 908-673-1833 <tel:781-405-5114>
>>
>> m: 908-451-6983 <tel:781-405-5114>
>>
>> _lnicotra at interactions.com <mailto:lnicotra at interactions.com>_
>>
>> www.interactions.com <http://www.interactions.com/>
>>
>> *******************************************************************************
>>
>> This e-mail and any of its attachments may contain Interactions
>> LLC proprietary information, which is privileged, confidential,
>> or subject to copyright belonging to the Interactions LLC. This
>> e-mail is intended solely for the use of the individual or entity
>> to which it is addressed. If you are not the intended recipient
>> of this e-mail, you are hereby notified that any dissemination,
>> distribution, copying, or action taken in relation to the
>> contents of and attachments to this e-mail is strictly prohibited
>> and may be unlawful. If you have received this e-mail in error,
>> please notify the sender immediately and permanently delete the
>> original and any copy of this e-mail and any printout. Thank You.
>>
>> *******************************************************************************
>>
>
>
> --
>
> LOU NICOTRA
>
> IT Systems Engineer - SLT
>
> Interactions LLC
>
> o: 908-673-1833 <tel:781-405-5114>
>
> m: 908-451-6983 <tel:781-405-5114>
>
> _lnicotra at interactions.com <mailto:lnicotra at interactions.com>_
>
> www.interactions.com <http://www.interactions.com/>
>
> *******************************************************************************
>
> This e-mail and any of its attachments may contain Interactions LLC
> proprietary information, which is privileged, confidential, or subject
> to copyright belonging to the Interactions LLC. This e-mail is
> intended solely for the use of the individual or entity to which it is
> addressed. If you are not the intended recipient of this e-mail, you
> are hereby notified that any dissemination, distribution, copying, or
> action taken in relation to the contents of and attachments to this
> e-mail is strictly prohibited and may be unlawful. If you have
> received this e-mail in error, please notify the sender immediately
> and permanently delete the original and any copy of this e-mail and
> any printout. Thank You.
>
> *******************************************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190816/0ab5b200/attachment-0001.htm>
More information about the slurm-users
mailing list