[slurm-users] follow-up: [Still broken]CentOS 7 CUDA 8.0 can't find plugin cons_tres
Lisa Kay Weihl
lweihl at bgsu.edu
Fri Apr 17 17:55:21 UTC 2020
I went back and built the slurm-19.05.6 rpms using:
rpmbuld -ta slurm-19.05.6.tar.bz2 for slurm-19.05.6.
It still failed with:
Error: Package: slurm-19.05.6-1.el7.x86_64
Requires: libnvidia-ml.so.1()(64bit)
Now I remember why I went back to 18.08. It was because this post https://lists.schedmd.com/pipermail/slurm-users/2019-August/003910.html reported the same errors. He said he had no issues with 18.08 and he was looking for using GPU. I guess that's why I thought 18.08 supported cons_tres
I followed the rest of that thread and it follows my issues pretty much the same although AdavancedHPC installed CUDA 8, I assume from the NVIDIA rpm because /etc/yum.repos.d contains a cuda file and looks similar to other machines where I've installed cuda via that method.
It was suggested that he go back to version 10.0 of CUDA because the newer CUDAs don't build links properly but we are back even further than 10 so I figured that must be okay.
libnvidia-ml it there evidenced by ldconfig -p:
libnvidia-ml.so.1 (lib6.x86-64) => /lib64/libnvidia-ml.so.1
libnvidia-ml.so.1 (lib6) => /lib/libnvidia-ml.so.1
libnvidia-ml.so (lib6.x86-64) => /lib64/libnvidia-ml.so
libnvidia-ml.so (lib6) => /lib/libnvidia-ml.so
The gentleman in the thread was going to report back if rolling back to CUDA 10.0 helped him but I never saw another post.
I also found a post about adding some linker switches to slurm.spec before building the rpms but that was for CentOS 8. Even if I add those and rebuild the rpms I get the same error message.
I'm at a loss for what combination I need to make this work.
--
Lisa Weihl
Systems Administrator, Computer Science
Bowling Green State University
Tel: (419) 372-0116 | Fax: (419) 372-8061
lweihl at bgsu.edu
www.bgsu.edu
-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of slurm-users-request at lists.schedmd.com
Sent: Friday, April 17, 2020 10:00 AM
To: slurm-users at lists.schedmd.com
Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 35
Send slurm-users mailing list submissions to
slurm-users at lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943217204&sdata=ZvuYVsbhXhI1%2Bb%2FhUNT306rkHPoAKyzFcnDJG4kYin4%3D&reserved=0
or, via email, send a message with subject or body 'help' to
slurm-users-request at lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner at lists.schedmd.com
When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..."
Today's Topics:
1. Re: slurm-20.02.1-1 failed rpmbuild with error File not found
(Ole Holm Nielsen)
2. Re: [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32
(Lisa Kay Weihl)
3. Re: [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32
(Renfro, Michael)
----------------------------------------------------------------------
Message: 1
Date: Fri, 17 Apr 2020 14:11:03 +0200
From: Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
To: <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error
File not found
Message-ID: <2a452504-183a-6208-f367-f5ae2d03d486 at fysik.dtu.dk>
Content-Type: text/plain; charset="utf-8"; format=flowed
On 17-04-2020 11:47, Ole Holm Nielsen wrote:
> On 17-04-2020 10:38, Christian Anthon wrote:
>> It would be neat to have these build requirements / install
>> requirements built into the spec file.
>
> I agree with you, and it seems that the SchedMD pages no longer list
> the build prerequisites (I think there was some information in the past).
> Try googling for "slurm build prerequisites" and see which pages this
> gives you :-)
If you read the page https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=ctDmhFUK4Mbm3T9z9Suwz2k9YxNazHbmorSCbNwuP5c%3D&reserved=0
carefully, please note the section starting with:
> Optional Slurm plugins will be built automatically when the configure script detects that the required build requirements are present. Build dependencies for various plugins and commands are denoted below:
A list of optional software is given, but not in a format that is immediately applicable to any particular Linux distribution. For CentOS you should consult
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.fysik.dtu.dk%2Fniflheim%2FSlurm_installation%23build-slurm-rpms&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=%2Bmj3OKaS9UVi1Vw7jaM2tDYpv22c%2FSVqBcysFRkUDU8%3D&reserved=0
> The Slurm build system searches for installed software and omits Slurm
> components where it didn't find the prerequisites installed on the system.
>
> To submit a bug report against the slurm.spec file, you would need to
> have a support contract with SchedMD.? We get a lot of benefit from
> having such a support contract ;-)
A bug report for slurm.spec has been submitted at
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D8882&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=DkIZBXoAFSKeHrmWmskgOGzMU3n4cc7u18Af15p%2Blaw%3D&reserved=0
/Ole
------------------------------
Message: 2
Date: Fri, 17 Apr 2020 13:46:41 +0000
From: Lisa Kay Weihl <lweihl at bgsu.edu>
To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXTERNAL] Follow-up-slurm-users Digest,
Vol 30, Issue 32
Message-ID:
<DM5PR05MB290502EC9DFB8BCF71EECECBB0D90 at DM5PR05MB2905.namprd05.prod.outlook.com>
Content-Type: text/plain; charset="iso-8859-1"
Wow. I did not catch that version issue. I saw that there were issues with the newest Slurm and how CUDA 10+ installs so I avoided that even though we have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with that and went back to 18 but now that I have more experience setting it up I'll wipe the 18 install and start over. Fingers crossed for success!
Thanks for your help!
--
Lisa?Weihl?
Systems Administrator,?Computer Science?
Bowling Green State?University
Tel: (419) 372-0116?? |??? Fax: (419) 372-8061 lweihl at bgsu.edu http://www.bgsu.edu/
-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of slurm-users-request at lists.schedmd.com
Sent: Thursday, April 16, 2020 6:39 PM
To: slurm-users at lists.schedmd.com
Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32
Send slurm-users mailing list submissions to
slurm-users at lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=6OxeF38kuSvaRtcdnYM1uaSYDjwiwv6OtRejJvPcXtI%3D&reserved=0
or, via email, send a message with subject or body 'help' to
slurm-users-request at lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner at lists.schedmd.com
When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..."
Today's Topics:
1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl)
2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres
(Sean Crosby)
----------------------------------------------------------------------
Message: 1
Date: Thu, 16 Apr 2020 19:00:03 +0000
From: Lisa Kay Weihl <lweihl at bgsu.edu>
To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres
Message-ID:
<DM5PR05MB29056BE0862DB04AA8960355B0D80 at DM5PR05MB2905.namprd05.prod.outlook.com>
Content-Type: text/plain; charset="utf-8"
I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to serve as a computer server for data science jobs. My department chair wants a job scheduler on it. I have installed SLURM (18.08.9). That works just fine in a basic configuration when I attempt to add Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
and then try to restart slurmd I get an error that it cannot find the plugin
slurmd: error: Couldn't find the specified plugin name for select/cons_tres looking at all files
slurmd: error: cannot find select plugin for select/cons_tres
slurmd: fatal: Can't find plugin for select/cons_tres
The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
I usually keep notes when I'm installing things but in this case I wasn't jotting things down as I went. I think I started with the instructions on this page: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=ctDmhFUK4Mbm3T9z9Suwz2k9YxNazHbmorSCbNwuP5c%3D&reserved=0 and went with the usual ./configure, make, make install.
I have a feeling maybe something did not work and I switched to the rpm packages based on some other web pages I saw because if I do a yum list installed | grep slurm I see a lot of pacakages. The problem is I was interrupted with other tasks and my memory was somewhat rusty when I came back to this.
When I went looking for this error I saw there were some issues with the newest SLURM and CUDA 10.2 but I didn't think that should be an issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18.
I'm willing to start all over if anyone thinks cleaning up and rebuilding will help that. I do see libraries in /etc/lib64/slurm but I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if that's left over from trying to install from source. All the daemons are in /usr/sbin and user commands in /usr/bin
I'm a newbie at this and very frustrated. Can anyone help?
***************************************************************
Lisa Weihl Systems Administrator
Computer Science, Bowling Green State University
Tel: (419) 372-0116 | Fax: (419) 372-8061
lweihl at bgsu.edu
http://www.bgsu.edu/?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200416%2F450a069d%2Fattachment-0001.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=uae7h2UOC1bgaFQpsrKb0ZFY%2F66vSqwvh%2BVGsUWuiw0%3D&reserved=0>
------------------------------
Message: 2
Date: Fri, 17 Apr 2020 08:38:27 +1000
From: Sean Crosby <scrosby at unimelb.edu.au>
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find
plugin cons_tres
Message-ID:
<CAFstPEBO5+MthqskkP8dbo6Vvy8=F8YrcZBxaNwZmz1Qdx3NJQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi Lisa,
cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08, it won't be there. The select plugin for 18.05 is cons_res.
Is there a reason why you're using an old Slurm?
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia
On Fri, 17 Apr 2020 at 05:00, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:
> *UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.*
> ------------------------------
> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is
> to serve as a computer server for data science jobs. My department
> chair wants a job scheduler on it. I have installed SLURM (18.08.9).
> That works just fine in a basic configuration when I attempt to add
> Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
>
> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2
> CoresPerSocket=6
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
>
> and then try to restart slurmd I get an error that it cannot find the
> plugin
>
> slurmd: error: Couldn't find the specified plugin name for
> select/cons_tres looking at all files
>
> slurmd: error: cannot find select plugin for select/cons_tres
>
> slurmd: fatal: Can't find plugin for select/cons_tres
>
> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
>
> I usually keep notes when I'm installing things but in this case I
> wasn't jotting things down as I went. I think I started with the
> instructions on this page:
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=ctDmhFUK4Mbm3T9z9Suwz2k9YxNazHbmorSCbNwuP5c%3D&reserved=0 and went with the usual ./configure, make, make install.
>
> I have a feeling maybe something did not work and I switched to the
> rpm packages based on some other web pages I saw because if I do a yum
> list installed | grep slurm I see a lot of pacakages. The problem is I
> was interrupted with other tasks and my memory was somewhat rusty when
> I came back to this.
>
> When I went looking for this error I saw there were some issues with
> the newest SLURM and CUDA 10.2 but I didn't think that should be an
> issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18.
>
> I'm willing to start all over if anyone thinks cleaning up and
> rebuilding will help that. I do see libraries in /etc/lib64/slurm but
> I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if
> that's left over from trying to install from source. All the daemons
> are in /usr/sbin and user commands in /usr/bin
>
> I'm a newbie at this and very frustrated. Can anyone help?
>
> ***************************************************************
>
> Lisa Weihl *Systems Administrator*
>
>
> *Computer Science, Bowling Green State University *Tel: (419) 372-0116
> | Fax: (419) 372-8061
> lweihl at bgsu.edu
> http://www.bgsu.edu/?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200417%2Facda81ed%2Fattachment.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=IEac%2FBAOvA8naaBaM3%2FfKvlnQVCM1DArIHGADh6qQVo%3D&reserved=0>
End of slurm-users Digest, Vol 30, Issue 32
*******************************************
------------------------------
Message: 3
Date: Fri, 17 Apr 2020 13:59:38 +0000
From: "Renfro, Michael" <Renfro at tntech.edu>
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXTERNAL] Follow-up-slurm-users Digest,
Vol 30, Issue 32
Message-ID: <023DFF6B-15F8-406C-943C-B0B764A59E98 at tntech.edu>
Content-Type: text/plain; charset="utf-8"
Can?t speak for everyone, but I went to Slurm 19.05 some months back, and haven't had any problems with CUDA 10.0 or 10.1 (or 8.0, 9.0, or 9.1).
> On Apr 17, 2020, at 8:46 AM, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:
>
> External Email Warning
>
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
>
> ________________________________
>
> Wow. I did not catch that version issue. I saw that there were issues with the newest Slurm and how CUDA 10+ installs so I avoided that even though we have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with that and went back to 18 but now that I have more experience setting it up I'll wipe the 18 install and start over. Fingers crossed for success!
>
> Thanks for your help!
>
> --
> Lisa Weihl
> Systems Administrator, Computer Science Bowling Green State University
> Tel: (419) 372-0116 | Fax: (419) 372-8061
> lweihl at bgsu.edu
> http://www.bgsu.edu/
>
> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of
> slurm-users-request at lists.schedmd.com
> Sent: Thursday, April 16, 2020 6:39 PM
> To: slurm-users at lists.schedmd.com
> Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32
>
> Send slurm-users mailing list submissions to
> slurm-users at lists.schedmd.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=02
> %7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729
> d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=6OxeF
> 38kuSvaRtcdnYM1uaSYDjwiwv6OtRejJvPcXtI%3D&reserved=0
> or, via email, send a message with subject or body 'help' to
> slurm-users-request at lists.schedmd.com
>
> You can reach the person managing the list at
> slurm-users-owner at lists.schedmd.com
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..."
>
>
> Today's Topics:
>
> 1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl)
> 2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres
> (Sean Crosby)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 16 Apr 2020 19:00:03 +0000
> From: Lisa Kay Weihl <lweihl at bgsu.edu>
> To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
> Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres
> Message-ID:
>
> <DM5PR05MB29056BE0862DB04AA8960355B0D80 at DM5PR05MB2905.namprd05.prod.ou
> tlook.com>
>
> Content-Type: text/plain; charset="utf-8"
>
> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to serve as a computer server for data science jobs. My department chair wants a job scheduler on it. I have installed SLURM (18.08.9). That works just fine in a basic configuration when I attempt to add Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
>
>
> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2
> CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
>
> and then try to restart slurmd I get an error that it cannot find the
> plugin
>
> slurmd: error: Couldn't find the specified plugin name for
> select/cons_tres looking at all files
>
> slurmd: error: cannot find select plugin for select/cons_tres
>
> slurmd: fatal: Can't find plugin for select/cons_tres
>
> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
>
> I usually keep notes when I'm installing things but in this case I wasn't jotting things down as I went. I think I started with the instructions on this page: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195&sdata=ctDmhFUK4Mbm3T9z9Suwz2k9YxNazHbmorSCbNwuP5c%3D&reserved=0 and went with the usual ./configure, make, make install.
>
> I have a feeling maybe something did not work and I switched to the rpm packages based on some other web pages I saw because if I do a yum list installed | grep slurm I see a lot of pacakages. The problem is I was interrupted with other tasks and my memory was somewhat rusty when I came back to this.
>
> When I went looking for this error I saw there were some issues with the newest SLURM and CUDA 10.2 but I didn't think that should be an issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18.
>
> I'm willing to start all over if anyone thinks cleaning up and
> rebuilding will help that. I do see libraries in /etc/lib64/slurm but
> I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if
> that's left over from trying to install from source. All the daemons
> are in /usr/sbin and user commands in /usr/bin
>
> I'm a newbie at this and very frustrated. Can anyone help?
>
> ***************************************************************
>
> Lisa Weihl Systems Administrator
>
> Computer Science, Bowling Green State University
> Tel: (419) 372-0116 | Fax: (419) 372-8061
> lweihl at bgsu.edu
> http://www.bgsu.edu/?
> -------------- next part -------------- An HTML attachment was
> scrubbed...
> URL:
> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flist
> s.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200416%2F450
> a069d%2Fattachment-0001.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C7af
> bebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7
> C0%7C637227287943237191&sdata=2qkrR4A2NA%2BUzljdCCZnqthPubo5OcFbrq
> fNxbfdvUA%3D&reserved=0>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 17 Apr 2020 08:38:27 +1000
> From: Sean Crosby <scrosby at unimelb.edu.au>
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find
> plugin cons_tres
> Message-ID:
>
> <CAFstPEBO5+MthqskkP8dbo6Vvy8=F8YrcZBxaNwZmz1Qdx3NJQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Lisa,
>
> cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08, it won't be there. The select plugin for 18.05 is cons_res.
>
> Is there a reason why you're using an old Slurm?
>
> Sean
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research
> Computing Services | Business Services The University of Melbourne,
> Victoria 3010 Australia
>
>
>
> On Fri, 17 Apr 2020 at 05:00, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:
>
>> *UoM notice: External email. Be cautious of links, attachments, or
>> impersonation attempts.*
>> ------------------------------
>> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is
>> to serve as a computer server for data science jobs. My department
>> chair wants a job scheduler on it. I have installed SLURM (18.08.9).
>> That works just fine in a basic configuration when I attempt to add
>> Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
>>
>> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2
>> CoresPerSocket=6
>> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
>>
>> and then try to restart slurmd I get an error that it cannot find the
>> plugin
>>
>> slurmd: error: Couldn't find the specified plugin name for
>> select/cons_tres looking at all files
>>
>> slurmd: error: cannot find select plugin for select/cons_tres
>>
>> slurmd: fatal: Can't find plugin for select/cons_tres
>>
>> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
>>
>> I usually keep notes when I'm installing things but in this case I
>> wasn't jotting things down as I went. I think I started with the
>> instructions on this page:
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943237191&sdata=2mYZlzu2HRYfJ7IP2r4%2F1XAe1pdCdu04sdcQiF1MBQI%3D&reserved=0 and went with the usual ./configure, make, make install.
>>
>> I have a feeling maybe something did not work and I switched to the
>> rpm packages based on some other web pages I saw because if I do a
>> yum list installed | grep slurm I see a lot of pacakages. The problem
>> is I was interrupted with other tasks and my memory was somewhat
>> rusty when I came back to this.
>>
>> When I went looking for this error I saw there were some issues with
>> the newest SLURM and CUDA 10.2 but I didn't think that should be an
>> issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18.
>>
>> I'm willing to start all over if anyone thinks cleaning up and
>> rebuilding will help that. I do see libraries in /etc/lib64/slurm but
>> I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if
>> that's left over from trying to install from source. All the daemons
>> are in /usr/sbin and user commands in /usr/bin
>>
>> I'm a newbie at this and very frustrated. Can anyone help?
>>
>> ***************************************************************
>>
>> Lisa Weihl *Systems Administrator*
>>
>>
>> *Computer Science, Bowling Green State University *Tel: (419)
>> 372-0116
>> | Fax: (419) 372-8061
>> lweihl at bgsu.edu
>> http://www.bgsu.edu/?
>>
> -------------- next part -------------- An HTML attachment was
> scrubbed...
> URL:
> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flist
> s.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200417%2Facd
> a81ed%2Fattachment.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca
> 312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C
> 637227287943237191&sdata=%2BgLBRjFu2wR3Vu%2FXvfj39gz3S13qEy55%2BUf
> 0IonHGgE%3D&reserved=0>
>
> End of slurm-users Digest, Vol 30, Issue 32
> *******************************************
>
End of slurm-users Digest, Vol 30, Issue 35
*******************************************
More information about the slurm-users
mailing list