[slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

Lou Nicotra lnicotra at interactions.com
Fri Aug 16 12:47:21 UTC 2019


Brian, the package is being built and installed on the master server.  I am
testing by removing all instances of V18 and installing the newly created
V19 slurm rpms,  I get the error message on the slurm rpm install, all
others (ctl, db, ... ) install fine.

After I get the error message, I remove all rpms from V19 and reinstall V18
using the same procedure with no issues... And the system sees all nodes as
it did before trying to install V19

The nvidia libraries are installed via the official Nvidia
rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
supporting cuda10. Multi GPU server currently used by multiple users (DNN
training) with no errors of any type while utilizing the nvidia libs/code.

nvidia-smi command shows:  NVIDIA-SMI 418.39       Driver Version: 418.39
    CUDA Version: 10.1

So, it is definitely something new to the V19 release... I have installed
18.08.0, .3, .4 and .8 on the same server and nodes since Sep of 2018 using
the same procedures and never had any issues... Currently running 18.08.8

Thanks.
Lou

On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus <toomuchit at gmail.com> wrote:

> Lou,
>
> Are you installing on the same machine you built?
>
> Are the nvidia libraries installed by RPM or a 'make install' on the box
> you compiled it on?
>
> Brian Andrus
> On 8/15/2019 7:53 AM, Lou Nicotra wrote:
>
> I have tried running ldconfig manually as suggested with
> slurm-19.05.1-2 and it fails the same way...
> error: Failed dependencies:
>         libnvidia-ml.so.1()(64bit) is needed by
> slurm-19.05.1-2.el7.centos.x86_64
>
> ldconfig -p shows:
> root at panther02 slurm# ldconfig -p|grep libnvidia-ml.
>         libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib64/libnvidia-ml.so.1
>         libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
>         libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
>         libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
>
> Just tried the latest release slurm-19.05.2 and it fails in the same
> way...
> root at panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
> error: Failed dependencies:
>         libnvidia-ml.so.1()(64bit) is needed by
> slurm-19.05.2-1.el7.centos.x86_64
>
> Reinstalled slurm-18.08.8 and it installs with no issues... Just
> like slurm-18.08.03 and slurm-18.08.4 did...  All built on the same machine
> with rpmbuild -ta command...
> root at panther02 slurm-18.08.8# rpm -Uvh
> slurm-18.08.8-1.el7.centos.x86_64.rpm
> Preparing...                          #################################
> [100%]
> Updating / installing...
>    1:slurm-18.08.8-1.el7.centos       #################################
> [100%]
>
> Oh, well...
>
> Lou
>
>
>
> On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec <barbara.krasovec at ijs.si>
> wrote:
>
>> What if you try to run ldconfig manually before building the rpm?
>>
>> Cheers,
>>
>> Barbara
>> On 8/8/19 5:57 PM, Lou Nicotra wrote:
>>
>> I am running into an error while trying to
>> install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows:
>> root at panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
>> error: Failed dependencies:
>>         libnvidia-ml.so.1()(64bit) is needed by
>> slurm-19.05.1-2.el7.centos.x86_64
>>
>> Packages are built using rpmbuild... And complete with no errors...
>> + cd /root/rpmbuild/BUILD
>> + cd slurm-19.05.1-2
>> + rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
>> + exit 0
>>
>> Investigation of the output while building the rpm package shows that
>> nvidia-ml is found:
>> checking for nvmlInit in -lnvidia-ml... yes
>> .
>> .
>> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>> -I../../../../slurm -I../../../.. -I../../../../src/common
>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
>> gpu_nvml.c  -fPIC -DPIC -o .libs/gpu_nvml.o
>> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>> -I../../../../slurm -I../../../.. -I../../../../src/common
>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
>> gpu_nvml.c -o gpu_nvml.o >/dev/null 2>&1
>> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc
>>  -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
>> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
>> -grecord-gcc-switches   -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1
>> -fno-strict-aliasing -module -avoid-version --export-dynamic -Wl,-z,relro
>> -o gpu_nvml.la -rpath /usr/lib64/slurm gpu_nvml.lo -lnvidia-ml
>> libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o   -lnvidia-ml  -O2
>> -g -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic
>> -pthread -ggdb3 -g -O1 -Wl,-z -Wl,relro   -pthread -Wl,-soname
>> -Wl,gpu_nvml.so -o .libs/gpu_nvml.so
>>
>> The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
>> includes: NVML_LIBS = -lnvidia-ml
>> but previous releases did not (slurm-18.08.8) And I was able to compile
>> and install that release with no issues after building it with rpmbuild...
>>
>> My LD_LIBRARY_PATH is
>> /usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:
>>
>> Can anyone provide suggestions on working out this issue?
>>
>> Thanks.
>>  --
>>
>> LOU NICOTRA
>>
>> IT Systems Engineer - SLT
>>
>> Interactions LLC
>>
>> o:  908-673-1833 <781-405-5114>
>>
>> m: 908-451-6983 <781-405-5114>
>>
>> *lnicotra at interactions.com <lnicotra at interactions.com>*
>> www.interactions.com
>>
>>
>> *******************************************************************************
>>
>> This e-mail and any of its attachments may contain Interactions LLC
>> proprietary information, which is privileged, confidential, or subject to
>> copyright belonging to the Interactions LLC. This e-mail is intended solely
>> for the use of the individual or entity to which it is addressed. If you
>> are not the intended recipient of this e-mail, you are hereby notified that
>> any dissemination, distribution, copying, or action taken in relation to
>> the contents of and attachments to this e-mail is strictly prohibited and
>> may be unlawful. If you have received this e-mail in error, please notify
>> the sender immediately and permanently delete the original and any copy of
>> this e-mail and any printout. Thank You.
>>
>>
>> *******************************************************************************
>>
>>
>>
>
> --
>
> LOU NICOTRA
>
> IT Systems Engineer - SLT
>
> Interactions LLC
>
> o:  908-673-1833 <781-405-5114>
>
> m: 908-451-6983 <781-405-5114>
>
> *lnicotra at interactions.com <lnicotra at interactions.com>*
> www.interactions.com
>
>
> *******************************************************************************
>
> This e-mail and any of its attachments may contain Interactions LLC
> proprietary information, which is privileged, confidential, or subject to
> copyright belonging to the Interactions LLC. This e-mail is intended solely
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient of this e-mail, you are hereby notified that
> any dissemination, distribution, copying, or action taken in relation to
> the contents of and attachments to this e-mail is strictly prohibited and
> may be unlawful. If you have received this e-mail in error, please notify
> the sender immediately and permanently delete the original and any copy of
> this e-mail and any printout. Thank You.
>
>
> *******************************************************************************
>
>
>

-- 

LOU NICOTRA

IT Systems Engineer - SLT

Interactions LLC

o:  908-673-1833 <781-405-5114>

m: 908-451-6983 <781-405-5114>

*lnicotra at interactions.com <lnicotra at interactions.com>*
www.interactions.com

-- 





*******************************************************************************




This e-mail and any of its attachments may contain
Interactions LLC 
proprietary information, which is privileged,
confidential, or subject to 
copyright belonging to the Interactions
LLC. This e-mail is intended solely 
for the use of the individual or
entity to which it is addressed. If you 
are not the intended recipient of this
e-mail, you are hereby notified that 
any dissemination, distribution, copying,
or action taken in relation to 
the contents of and attachments to this e-mail
is strictly prohibited and 
may be unlawful. If you have received this e-mail in
error, please notify 
the sender immediately and permanently delete the original
and any copy of 
this e-mail and any printout. Thank You.  




******************************************************************************* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190816/b522baaa/attachment-0001.htm>


More information about the slurm-users mailing list