[slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres

Sean Crosby scrosby at unimelb.edu.au
Thu Apr 16 22:38:27 UTC 2020


Hi Lisa,

cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08,
it won't be there. The select plugin for 18.05 is cons_res.

Is there a reason why you're using an old Slurm?

Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Fri, 17 Apr 2020 at 05:00, Lisa Kay Weihl <lweihl at bgsu.edu> wrote:

> *UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts.*
> ------------------------------
> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to
> serve as a computer server for data science jobs. My department chair wants
> a job scheduler on it. I have installed SLURM (18.08.9). That works just
> fine in a basic configuration when I attempt to add Gres_Types gpu and then
> add Gres:gpu:4 to the end of the node description:
>
> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
>
> and then try to restart slurmd I get an error that it cannot find the
> plugin
>
> slurmd: error: Couldn't find the specified plugin name for
> select/cons_tres looking at all files
>
> slurmd: error: cannot find select plugin for select/cons_tres
>
> slurmd: fatal: Can't find plugin for select/cons_tres
>
> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
>
> I usually keep notes when I'm installing things but in this case I wasn't
> jotting things down as I went. I think I started with the instructions on
> this page: https://slurm.schedmd.com/quickstart_admin.html and went with
> the usual ./configure, make, make install.
>
> I have a feeling maybe something did not work and I switched to the rpm
> packages based on some other web pages I saw because if I do a yum list
> installed | grep slurm I see a lot of pacakages. The problem is I was
> interrupted with other tasks and my memory was somewhat rusty when I came
> back to this.
>
> When I went looking for this error I saw there were some issues with the
> newest SLURM and CUDA 10.2 but I didn't think that should be an issue
> because I was at CUDA 8.0.  Just in case I backed down to SLURM 18.
>
> I'm willing to start all over if anyone thinks cleaning up and rebuilding
> will help that. I do see libraries in /etc/lib64/slurm but I also see 2
> files in /usr/local/lib/slurm/src so I'm not sure if that's left over from
> trying to install from source.  All the daemons are in /usr/sbin and user
> commands in /usr/bin
>
> I'm a newbie at this and very frustrated. Can anyone help?
>
> ***************************************************************
>
> Lisa Weihl *Systems Administrator*
>
>
> *Computer Science, Bowling Green State University *Tel: (419) 372-0116
> |    Fax: (419) 372-8061
> lweihl at bgsu.edu
> www.bgsu.edu​
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200417/acda81ed/attachment-0001.htm>


More information about the slurm-users mailing list