[slurm-users] [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32

Lisa Kay Weihl lweihl at bgsu.edu
Fri Apr 17 13:46:41 UTC 2020


Wow. I did not catch that version issue. I saw that there were issues with the newest Slurm and how CUDA 10+ installs so I avoided that even though we have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with that and went back to 18 but now that I have more experience setting it up I'll wipe the 18 install and start over. Fingers crossed for success!

Thanks for your help!

--
Lisa Weihl 
Systems Administrator, Computer Science 
Bowling Green State University
Tel: (419) 372-0116   |    Fax: (419) 372-8061
lweihl at bgsu.edu
www.bgsu.edu

-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of slurm-users-request at lists.schedmd.com
Sent: Thursday, April 16, 2020 6:39 PM
To: slurm-users at lists.schedmd.com
Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32

Send slurm-users mailing list submissions to
	slurm-users at lists.schedmd.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D782Wwobcc6ezSuy5GipiXuiH7EKRMm5Llk3BRwYnss%3D&reserved=0
or, via email, send a message with subject or body 'help' to
	slurm-users-request at lists.schedmd.com

You can reach the person managing the list at
	slurm-users-owner at lists.schedmd.com

When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..."


Today's Topics:

   1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl)
   2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres
      (Sean Crosby)


----------------------------------------------------------------------

Message: 1
Date: Thu, 16 Apr 2020 19:00:03 +0000
From: Lisa Kay Weihl <lweihl at bgsu.edu>
To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres
Message-ID:
	<DM5PR05MB29056BE0862DB04AA8960355B0D80 at DM5PR05MB2905.namprd05.prod.outlook.com>
	
Content-Type: text/plain; charset="utf-8"

I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to serve as a computer server for data science jobs. My department chair wants a job scheduler on it. I have installed SLURM (18.08.9). That works just fine in a basic configuration when I attempt to add Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:


NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4

and then try to restart slurmd I get an error that it cannot find the plugin

slurmd: error: Couldn't find the specified plugin name for select/cons_tres looking at all files

slurmd: error: cannot find select plugin for select/cons_tres

slurmd: fatal: Can't find plugin for select/cons_tres

The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0

I usually keep notes when I'm installing things but in this case I wasn't jotting things down as I went. I think I started with the instructions on this page: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 and went with the usual ./configure, make, make install.

I have a feeling maybe something did not work and I switched to the rpm packages based on some other web pages I saw because if I do a yum list installed | grep slurm I see a lot of pacakages. The problem is I was interrupted with other tasks and my memory was somewhat rusty when I came back to this.

When I went looking for this error I saw there were some issues with the newest SLURM and CUDA 10.2 but I didn't think that should be an issue because I was at CUDA 8.0.  Just in case I backed down to SLURM 18.

I'm willing to start all over if anyone thinks cleaning up and rebuilding will help that. I do see libraries in /etc/lib64/slurm but I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if that's left over from trying to install from source.  All the daemons are in /usr/sbin and user commands in /usr/bin

I'm a newbie at this and very frustrated. Can anyone help?

***************************************************************

Lisa Weihl Systems Administrator

Computer Science, Bowling Green State University
Tel: (419) 372-0116   |    Fax: (419) 372-8061
lweihl at bgsu.edu
http://www.bgsu.edu/?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200416%2F450a069d%2Fattachment-0001.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D8CwIzZ2C0lYQQn%2BEtFE4%2FHgSVdStiSjO2%2F0tZ3snHk%3D&reserved=0>

------------------------------

Message: 2
Date: Fri, 17 Apr 2020 08:38:27 +1000
From: Sean Crosby <scrosby at unimelb.edu.au>
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find
	plugin cons_tres
Message-ID:
	<CAFstPEBO5+MthqskkP8dbo6Vvy8=F8YrcZBxaNwZmz1Qdx3NJQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Lisa,

cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08, it won't be there. The select plugin for 18.05 is cons_res.

Is there a reason why you're using an old Slurm?

Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia



On Fri, 17 Apr 2020 at 05:00, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:

> *UoM notice: External email. Be cautious of links, attachments, or 
> impersonation attempts.*
> ------------------------------
> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is 
> to serve as a computer server for data science jobs. My department 
> chair wants a job scheduler on it. I have installed SLURM (18.08.9). 
> That works just fine in a basic configuration when I attempt to add 
> Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
>
> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 
> CoresPerSocket=6
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
>
> and then try to restart slurmd I get an error that it cannot find the 
> plugin
>
> slurmd: error: Couldn't find the specified plugin name for 
> select/cons_tres looking at all files
>
> slurmd: error: cannot find select plugin for select/cons_tres
>
> slurmd: fatal: Can't find plugin for select/cons_tres
>
> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
>
> I usually keep notes when I'm installing things but in this case I 
> wasn't jotting things down as I went. I think I started with the 
> instructions on this page: 
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 and went with the usual ./configure, make, make install.
>
> I have a feeling maybe something did not work and I switched to the 
> rpm packages based on some other web pages I saw because if I do a yum 
> list installed | grep slurm I see a lot of pacakages. The problem is I 
> was interrupted with other tasks and my memory was somewhat rusty when 
> I came back to this.
>
> When I went looking for this error I saw there were some issues with 
> the newest SLURM and CUDA 10.2 but I didn't think that should be an 
> issue because I was at CUDA 8.0.  Just in case I backed down to SLURM 18.
>
> I'm willing to start all over if anyone thinks cleaning up and 
> rebuilding will help that. I do see libraries in /etc/lib64/slurm but 
> I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if 
> that's left over from trying to install from source.  All the daemons 
> are in /usr/sbin and user commands in /usr/bin
>
> I'm a newbie at this and very frustrated. Can anyone help?
>
> ***************************************************************
>
> Lisa Weihl *Systems Administrator*
>
>
> *Computer Science, Bowling Green State University *Tel: (419) 372-0116
> |    Fax: (419) 372-8061
> lweihl at bgsu.edu
> http://www.bgsu.edu/?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200417%2Facda81ed%2Fattachment.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=KuHeR2ewb8Qx68c3bB3H8RSQwEPiyVvNGjpYUmdvRrg%3D&reserved=0>

End of slurm-users Digest, Vol 30, Issue 32
*******************************************



More information about the slurm-users mailing list