[slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

Brian Andrus toomuchit at gmail.com
Fri Aug 16 15:02:33 UTC 2019


Ah. I suspect your issue may be the cuda. 10.1 which does not 
create/register all the appropriate symlinks and "provides".
I ran into that trying to install tensorflow.

If you can, downgrade to 10.0, which does a better job of installing itself.

Brian

On 8/16/2019 5:47 AM, Lou Nicotra wrote:
> Brian, the package is being built and installed on the master server.  
> I am testing by removing all instances of V18 and installing the newly 
> created V19 slurm rpms,  I get the error message on the slurm rpm 
> install, all others (ctl, db, ... ) install fine.
>
> After I get the error message, I remove all rpms from V19 and 
> reinstall V18 using the same procedure with no issues... And the 
> system sees all nodes as it did before trying to install V19
>
> The nvidia libraries are installed via the official Nvidia 
> rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm 
> supporting cuda10. Multi GPU server currently used by multiple users 
> (DNN training) with no errors of any type while utilizing the nvidia 
> libs/code.
>
> nvidia-smi command shows:  NVIDIA-SMI 418.39       Driver Version: 
> 418.39       CUDA Version: 10.1
>
> So, it is definitely something new to the V19 release... I have 
> installed 18.08.0, .3, .4 and .8 on the same server and nodes since 
> Sep of 2018 using the same procedures and never had any issues... 
> Currently running 18.08.8
>
> Thanks.
> Lou
>
> On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus <toomuchit at gmail.com 
> <mailto:toomuchit at gmail.com>> wrote:
>
>     Lou,
>
>     Are you installing on the same machine you built?
>
>     Are the nvidia libraries installed by RPM or a 'make install' on
>     the box you compiled it on?
>
>     Brian Andrus
>
>     On 8/15/2019 7:53 AM, Lou Nicotra wrote:
>>     I have tried running ldconfig manually as suggested with 
>>     slurm-19.05.1-2 and it fails the same way...
>>     error: Failed dependencies:
>>             libnvidia-ml.so.1()(64bit) is needed by
>>     slurm-19.05.1-2.el7.centos.x86_64
>>
>>     ldconfig -p shows:
>>     root at panther02 slurm# ldconfig -p|grep libnvidia-ml.
>>             libnvidia-ml.so.1 (libc6,x86-64) =>
>>     /usr/lib64/libnvidia-ml.so.1
>>             libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
>>             libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
>>             libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
>>
>>     Just tried the latest release slurm-19.05.2 and it fails in the
>>     same way...
>>     root at panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
>>     error: Failed dependencies:
>>             libnvidia-ml.so.1()(64bit) is needed by
>>     slurm-19.05.2-1.el7.centos.x86_64
>>
>>     Reinstalled slurm-18.08.8 and it installs with no issues... Just
>>     like slurm-18.08.03 and slurm-18.08.4 did...  All built on the
>>     same machine with rpmbuild -ta command...
>>     root at panther02 slurm-18.08.8# rpm -Uvh
>>     slurm-18.08.8-1.el7.centos.x86_64.rpm
>>     Preparing...  ################################# [100%]
>>     Updating / installing...
>>        1:slurm-18.08.8-1.el7.centos #################################
>>     [100%]
>>
>>     Oh, well...
>>
>>     Lou
>>
>>
>>
>>     On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec
>>     <barbara.krasovec at ijs.si <mailto:barbara.krasovec at ijs.si>> wrote:
>>
>>         What if you try to run ldconfig manually before building the rpm?
>>
>>         Cheers,
>>
>>         Barbara
>>
>>         On 8/8/19 5:57 PM, Lou Nicotra wrote:
>>>         I am running into an error while trying to
>>>         install slurm-19.05.1-2.el7.centos.x86_64... Error is as
>>>         follows:
>>>         root at panther02 x86_64# rpm -Uvh
>>>         slurm-19.05.1-2.el7.centos.x86_64.rpm
>>>         error: Failed dependencies:
>>>                 libnvidia-ml.so.1()(64bit) is needed by
>>>         slurm-19.05.1-2.el7.centos.x86_64
>>>
>>>         Packages are built using rpmbuild... And complete with no
>>>         errors...
>>>         + cd /root/rpmbuild/BUILD
>>>         + cd slurm-19.05.1-2
>>>         + rm -rf
>>>         /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
>>>         + exit 0
>>>
>>>         Investigation of the output while building the rpm package
>>>         shows that nvidia-ml is found:
>>>         checking for nvmlInit in -lnvidia-ml... yes
>>>         .
>>>         .
>>>         libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>>>         -I../../../../slurm -I../../../.. -I../../../../src/common
>>>         -I/usr/local/cuda/include -I/usr/cuda/include
>>>         -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
>>>         -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>>         -fstack-protector-strong --param=ssp-buffer-size=4
>>>         -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
>>>         -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c  -fPIC -DPIC
>>>         -o .libs/gpu_nvml.o
>>>         libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>>>         -I../../../../slurm -I../../../.. -I../../../../src/common
>>>         -I/usr/local/cuda/include -I/usr/cuda/include
>>>         -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
>>>         -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>>         -fstack-protector-strong --param=ssp-buffer-size=4
>>>         -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
>>>         -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -o
>>>         gpu_nvml.o >/dev/null 2>&1
>>>         /bin/sh ../../../../libtool  --tag=CC --mode=link gcc
>>>          -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
>>>         -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>>         -fstack-protector-strong --param=ssp-buffer-size=4
>>>         -grecord-gcc-switches   -m64 -mtune=generic -pthread -ggdb3
>>>         -Wall -g -O1 -fno-strict-aliasing -module -avoid-version
>>>         --export-dynamic -Wl,-z,relro   -o gpu_nvml.la
>>>         <http://gpu_nvml.la> -rpath /usr/lib64/slurm gpu_nvml.lo
>>>         -lnvidia-ml
>>>         libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o
>>>         -lnvidia-ml -O2 -g -fstack-protector-strong
>>>         -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3 -g
>>>         -O1 -Wl,-z -Wl,relro   -pthread -Wl,-soname -Wl,gpu_nvml.so
>>>         -o .libs/gpu_nvml.so
>>>
>>>         The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
>>>         includes: NVML_LIBS = -lnvidia-ml
>>>         but previous releases did not (slurm-18.08.8) And I was able
>>>         to compile and install that release with no issues after
>>>         building it with rpmbuild...
>>>
>>>         My LD_LIBRARY_PATH is
>>>         /usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:
>>>
>>>         Can anyone provide suggestions on working out this issue?
>>>
>>>         Thanks.
>>>          --
>>>
>>>         LOU NICOTRA
>>>
>>>         IT Systems Engineer - SLT
>>>
>>>         Interactions LLC
>>>
>>>         o: 908-673-1833 <tel:781-405-5114>
>>>
>>>         m: 908-451-6983 <tel:781-405-5114>
>>>
>>>         _lnicotra at interactions.com <mailto:lnicotra at interactions.com>_
>>>
>>>         www.interactions.com <http://www.interactions.com/>
>>>
>>>         *******************************************************************************
>>>
>>>         This e-mail and any of its attachments may contain
>>>         Interactions LLC proprietary information, which is
>>>         privileged, confidential, or subject to copyright belonging
>>>         to the Interactions LLC. This e-mail is intended solely for
>>>         the use of the individual or entity to which it is
>>>         addressed. If you are not the intended recipient of this
>>>         e-mail, you are hereby notified that any dissemination,
>>>         distribution, copying, or action taken in relation to the
>>>         contents of and attachments to this e-mail is strictly
>>>         prohibited and may be unlawful. If you have received this
>>>         e-mail in error, please notify the sender immediately and
>>>         permanently delete the original and any copy of this e-mail
>>>         and any printout. Thank You.
>>>
>>>         *******************************************************************************
>>>
>>
>>
>>     -- 
>>
>>     LOU NICOTRA
>>
>>     IT Systems Engineer - SLT
>>
>>     Interactions LLC
>>
>>     o: 908-673-1833 <tel:781-405-5114>
>>
>>     m: 908-451-6983 <tel:781-405-5114>
>>
>>     _lnicotra at interactions.com <mailto:lnicotra at interactions.com>_
>>
>>     www.interactions.com <http://www.interactions.com/>
>>
>>     *******************************************************************************
>>
>>     This e-mail and any of its attachments may contain Interactions
>>     LLC proprietary information, which is privileged, confidential,
>>     or subject to copyright belonging to the Interactions LLC. This
>>     e-mail is intended solely for the use of the individual or entity
>>     to which it is addressed. If you are not the intended recipient
>>     of this e-mail, you are hereby notified that any dissemination,
>>     distribution, copying, or action taken in relation to the
>>     contents of and attachments to this e-mail is strictly prohibited
>>     and may be unlawful. If you have received this e-mail in error,
>>     please notify the sender immediately and permanently delete the
>>     original and any copy of this e-mail and any printout. Thank You.
>>
>>     *******************************************************************************
>>
>
>
> -- 
>
> LOU NICOTRA
>
> IT Systems Engineer - SLT
>
> Interactions LLC
>
> o: 908-673-1833 <tel:781-405-5114>
>
> m: 908-451-6983 <tel:781-405-5114>
>
> _lnicotra at interactions.com <mailto:lnicotra at interactions.com>_
>
> www.interactions.com <http://www.interactions.com/>
>
> *******************************************************************************
>
> This e-mail and any of its attachments may contain Interactions LLC 
> proprietary information, which is privileged, confidential, or subject 
> to copyright belonging to the Interactions LLC. This e-mail is 
> intended solely for the use of the individual or entity to which it is 
> addressed. If you are not the intended recipient of this e-mail, you 
> are hereby notified that any dissemination, distribution, copying, or 
> action taken in relation to the contents of and attachments to this 
> e-mail is strictly prohibited and may be unlawful. If you have 
> received this e-mail in error, please notify the sender immediately 
> and permanently delete the original and any copy of this e-mail and 
> any printout. Thank You.
>
> *******************************************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190816/0ab5b200/attachment-0001.htm>


More information about the slurm-users mailing list