[slurm-users] [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32
Lisa Kay Weihl
lweihl at bgsu.edu
Fri Apr 17 13:46:41 UTC 2020
Wow. I did not catch that version issue. I saw that there were issues with the newest Slurm and how CUDA 10+ installs so I avoided that even though we have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with that and went back to 18 but now that I have more experience setting it up I'll wipe the 18 install and start over. Fingers crossed for success!
Thanks for your help!
--
Lisa Weihl
Systems Administrator, Computer Science
Bowling Green State University
Tel: (419) 372-0116 | Fax: (419) 372-8061
lweihl at bgsu.edu
www.bgsu.edu
-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of slurm-users-request at lists.schedmd.com
Sent: Thursday, April 16, 2020 6:39 PM
To: slurm-users at lists.schedmd.com
Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32
Send slurm-users mailing list submissions to
slurm-users at lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D782Wwobcc6ezSuy5GipiXuiH7EKRMm5Llk3BRwYnss%3D&reserved=0
or, via email, send a message with subject or body 'help' to
slurm-users-request at lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner at lists.schedmd.com
When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..."
Today's Topics:
1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl)
2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres
(Sean Crosby)
----------------------------------------------------------------------
Message: 1
Date: Thu, 16 Apr 2020 19:00:03 +0000
From: Lisa Kay Weihl <lweihl at bgsu.edu>
To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres
Message-ID:
<DM5PR05MB29056BE0862DB04AA8960355B0D80 at DM5PR05MB2905.namprd05.prod.outlook.com>
Content-Type: text/plain; charset="utf-8"
I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to serve as a computer server for data science jobs. My department chair wants a job scheduler on it. I have installed SLURM (18.08.9). That works just fine in a basic configuration when I attempt to add Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
and then try to restart slurmd I get an error that it cannot find the plugin
slurmd: error: Couldn't find the specified plugin name for select/cons_tres looking at all files
slurmd: error: cannot find select plugin for select/cons_tres
slurmd: fatal: Can't find plugin for select/cons_tres
The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
I usually keep notes when I'm installing things but in this case I wasn't jotting things down as I went. I think I started with the instructions on this page: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 and went with the usual ./configure, make, make install.
I have a feeling maybe something did not work and I switched to the rpm packages based on some other web pages I saw because if I do a yum list installed | grep slurm I see a lot of pacakages. The problem is I was interrupted with other tasks and my memory was somewhat rusty when I came back to this.
When I went looking for this error I saw there were some issues with the newest SLURM and CUDA 10.2 but I didn't think that should be an issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18.
I'm willing to start all over if anyone thinks cleaning up and rebuilding will help that. I do see libraries in /etc/lib64/slurm but I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if that's left over from trying to install from source. All the daemons are in /usr/sbin and user commands in /usr/bin
I'm a newbie at this and very frustrated. Can anyone help?
***************************************************************
Lisa Weihl Systems Administrator
Computer Science, Bowling Green State University
Tel: (419) 372-0116 | Fax: (419) 372-8061
lweihl at bgsu.edu
http://www.bgsu.edu/?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200416%2F450a069d%2Fattachment-0001.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D8CwIzZ2C0lYQQn%2BEtFE4%2FHgSVdStiSjO2%2F0tZ3snHk%3D&reserved=0>
------------------------------
Message: 2
Date: Fri, 17 Apr 2020 08:38:27 +1000
From: Sean Crosby <scrosby at unimelb.edu.au>
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find
plugin cons_tres
Message-ID:
<CAFstPEBO5+MthqskkP8dbo6Vvy8=F8YrcZBxaNwZmz1Qdx3NJQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi Lisa,
cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08, it won't be there. The select plugin for 18.05 is cons_res.
Is there a reason why you're using an old Slurm?
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia
On Fri, 17 Apr 2020 at 05:00, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:
> *UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.*
> ------------------------------
> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is
> to serve as a computer server for data science jobs. My department
> chair wants a job scheduler on it. I have installed SLURM (18.08.9).
> That works just fine in a basic configuration when I attempt to add
> Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
>
> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2
> CoresPerSocket=6
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
>
> and then try to restart slurmd I get an error that it cannot find the
> plugin
>
> slurmd: error: Couldn't find the specified plugin name for
> select/cons_tres looking at all files
>
> slurmd: error: cannot find select plugin for select/cons_tres
>
> slurmd: fatal: Can't find plugin for select/cons_tres
>
> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
>
> I usually keep notes when I'm installing things but in this case I
> wasn't jotting things down as I went. I think I started with the
> instructions on this page:
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 and went with the usual ./configure, make, make install.
>
> I have a feeling maybe something did not work and I switched to the
> rpm packages based on some other web pages I saw because if I do a yum
> list installed | grep slurm I see a lot of pacakages. The problem is I
> was interrupted with other tasks and my memory was somewhat rusty when
> I came back to this.
>
> When I went looking for this error I saw there were some issues with
> the newest SLURM and CUDA 10.2 but I didn't think that should be an
> issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18.
>
> I'm willing to start all over if anyone thinks cleaning up and
> rebuilding will help that. I do see libraries in /etc/lib64/slurm but
> I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if
> that's left over from trying to install from source. All the daemons
> are in /usr/sbin and user commands in /usr/bin
>
> I'm a newbie at this and very frustrated. Can anyone help?
>
> ***************************************************************
>
> Lisa Weihl *Systems Administrator*
>
>
> *Computer Science, Bowling Green State University *Tel: (419) 372-0116
> | Fax: (419) 372-8061
> lweihl at bgsu.edu
> http://www.bgsu.edu/?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200417%2Facda81ed%2Fattachment.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=KuHeR2ewb8Qx68c3bB3H8RSQwEPiyVvNGjpYUmdvRrg%3D&reserved=0>
End of slurm-users Digest, Vol 30, Issue 32
*******************************************
More information about the slurm-users
mailing list