[slurm-users] [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32

Renfro, Michael Renfro at tntech.edu
Fri Apr 17 13:59:38 UTC 2020


Can’t speak for everyone, but I went to Slurm 19.05 some months back, and haven't had any problems with CUDA 10.0 or 10.1 (or 8.0, 9.0, or 9.1).

> On Apr 17, 2020, at 8:46 AM, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:
> 
> External Email Warning
> 
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> 
> ________________________________
> 
> Wow. I did not catch that version issue. I saw that there were issues with the newest Slurm and how CUDA 10+ installs so I avoided that even though we have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with that and went back to 18 but now that I have more experience setting it up I'll wipe the 18 install and start over. Fingers crossed for success!
> 
> Thanks for your help!
> 
> --
> Lisa Weihl
> Systems Administrator, Computer Science
> Bowling Green State University
> Tel: (419) 372-0116   |    Fax: (419) 372-8061
> lweihl at bgsu.edu
> www.bgsu.edu
> 
> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of slurm-users-request at lists.schedmd.com
> Sent: Thursday, April 16, 2020 6:39 PM
> To: slurm-users at lists.schedmd.com
> Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32
> 
> Send slurm-users mailing list submissions to
>        slurm-users at lists.schedmd.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D782Wwobcc6ezSuy5GipiXuiH7EKRMm5Llk3BRwYnss%3D&reserved=0
> or, via email, send a message with subject or body 'help' to
>        slurm-users-request at lists.schedmd.com
> 
> You can reach the person managing the list at
>        slurm-users-owner at lists.schedmd.com
> 
> When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..."
> 
> 
> Today's Topics:
> 
>   1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl)
>   2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres
>      (Sean Crosby)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 16 Apr 2020 19:00:03 +0000
> From: Lisa Kay Weihl <lweihl at bgsu.edu>
> To: "slurm-users at lists.schedmd.com" <slurm-users at lists.schedmd.com>
> Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres
> Message-ID:
>        <DM5PR05MB29056BE0862DB04AA8960355B0D80 at DM5PR05MB2905.namprd05.prod.outlook.com>
> 
> Content-Type: text/plain; charset="utf-8"
> 
> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to serve as a computer server for data science jobs. My department chair wants a job scheduler on it. I have installed SLURM (18.08.9). That works just fine in a basic configuration when I attempt to add Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
> 
> 
> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
> 
> and then try to restart slurmd I get an error that it cannot find the plugin
> 
> slurmd: error: Couldn't find the specified plugin name for select/cons_tres looking at all files
> 
> slurmd: error: cannot find select plugin for select/cons_tres
> 
> slurmd: fatal: Can't find plugin for select/cons_tres
> 
> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
> 
> I usually keep notes when I'm installing things but in this case I wasn't jotting things down as I went. I think I started with the instructions on this page: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 and went with the usual ./configure, make, make install.
> 
> I have a feeling maybe something did not work and I switched to the rpm packages based on some other web pages I saw because if I do a yum list installed | grep slurm I see a lot of pacakages. The problem is I was interrupted with other tasks and my memory was somewhat rusty when I came back to this.
> 
> When I went looking for this error I saw there were some issues with the newest SLURM and CUDA 10.2 but I didn't think that should be an issue because I was at CUDA 8.0.  Just in case I backed down to SLURM 18.
> 
> I'm willing to start all over if anyone thinks cleaning up and rebuilding will help that. I do see libraries in /etc/lib64/slurm but I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if that's left over from trying to install from source.  All the daemons are in /usr/sbin and user commands in /usr/bin
> 
> I'm a newbie at this and very frustrated. Can anyone help?
> 
> ***************************************************************
> 
> Lisa Weihl Systems Administrator
> 
> Computer Science, Bowling Green State University
> Tel: (419) 372-0116   |    Fax: (419) 372-8061
> lweihl at bgsu.edu
> http://www.bgsu.edu/?
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200416%2F450a069d%2Fattachment-0001.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D8CwIzZ2C0lYQQn%2BEtFE4%2FHgSVdStiSjO2%2F0tZ3snHk%3D&reserved=0>
> 
> ------------------------------
> 
> Message: 2
> Date: Fri, 17 Apr 2020 08:38:27 +1000
> From: Sean Crosby <scrosby at unimelb.edu.au>
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: Re: [slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find
>        plugin cons_tres
> Message-ID:
>        <CAFstPEBO5+MthqskkP8dbo6Vvy8=F8YrcZBxaNwZmz1Qdx3NJQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Lisa,
> 
> cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08, it won't be there. The select plugin for 18.05 is cons_res.
> 
> Is there a reason why you're using an old Slurm?
> 
> Sean
> --
> Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia
> 
> 
> 
> On Fri, 17 Apr 2020 at 05:00, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:
> 
>> *UoM notice: External email. Be cautious of links, attachments, or
>> impersonation attempts.*
>> ------------------------------
>> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is
>> to serve as a computer server for data science jobs. My department
>> chair wants a job scheduler on it. I have installed SLURM (18.08.9).
>> That works just fine in a basic configuration when I attempt to add
>> Gres_Types gpu and then add Gres:gpu:4 to the end of the node description:
>> 
>> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2
>> CoresPerSocket=6
>> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
>> 
>> and then try to restart slurmd I get an error that it cannot find the
>> plugin
>> 
>> slurmd: error: Couldn't find the specified plugin name for
>> select/cons_tres looking at all files
>> 
>> slurmd: error: cannot find select plugin for select/cons_tres
>> 
>> slurmd: fatal: Can't find plugin for select/cons_tres
>> 
>> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
>> 
>> I usually keep notes when I'm installing things but in this case I
>> wasn't jotting things down as I went. I think I started with the
>> instructions on this page:
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 and went with the usual ./configure, make, make install.
>> 
>> I have a feeling maybe something did not work and I switched to the
>> rpm packages based on some other web pages I saw because if I do a yum
>> list installed | grep slurm I see a lot of pacakages. The problem is I
>> was interrupted with other tasks and my memory was somewhat rusty when
>> I came back to this.
>> 
>> When I went looking for this error I saw there were some issues with
>> the newest SLURM and CUDA 10.2 but I didn't think that should be an
>> issue because I was at CUDA 8.0.  Just in case I backed down to SLURM 18.
>> 
>> I'm willing to start all over if anyone thinks cleaning up and
>> rebuilding will help that. I do see libraries in /etc/lib64/slurm but
>> I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if
>> that's left over from trying to install from source.  All the daemons
>> are in /usr/sbin and user commands in /usr/bin
>> 
>> I'm a newbie at this and very frustrated. Can anyone help?
>> 
>> ***************************************************************
>> 
>> Lisa Weihl *Systems Administrator*
>> 
>> 
>> *Computer Science, Bowling Green State University *Tel: (419) 372-0116
>> |    Fax: (419) 372-8061
>> lweihl at bgsu.edu
>> http://www.bgsu.edu/?
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200417%2Facda81ed%2Fattachment.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=KuHeR2ewb8Qx68c3bB3H8RSQwEPiyVvNGjpYUmdvRrg%3D&reserved=0>
> 
> End of slurm-users Digest, Vol 30, Issue 32
> *******************************************
> 



More information about the slurm-users mailing list