[slurm-users] Help with Con_Res Plugin Error

Jeffrey Frey frey at udel.edu
Thu Dec 13 08:16:35 MST 2018


When in doubt, check the source:


extern int select_g_select_nodeinfo_unpack(dynamic_plugin_data_t **nodeinfo,
                                           Buf buffer,
                                           uint16_t protocol_version)
{
        dynamic_plugin_data_t *nodeinfo_ptr = NULL;
        if (slurm_select_init(0) < 0)
                return SLURM_ERROR;
        nodeinfo_ptr = xmalloc(sizeof(dynamic_plugin_data_t));
        *nodeinfo = nodeinfo_ptr;
        if (protocol_version >= SLURM_MIN_PROTOCOL_VERSION) {
                int i;
                uint32_t plugin_id;
                safe_unpack32(&plugin_id, buffer);
                for (i=0; i<select_context_cnt; i++)
                        if (*(ops[i].plugin_id) == plugin_id) {
                                nodeinfo_ptr->plugin_id = i;
                                break;
                        }
                if (i >= select_context_cnt) {
                        error("we don't have select plugin type %u",plugin_id);
                        goto unpack_error;
                }
        }



Your slurmd's probably haven't been reconfigured yet and are expecting the linear plugin when they connect to the newly-restarted slurmctld.  They could probably do with a restart, assuming you've pushed-out slurm.conf changes to them.





> On Dec 13, 2018, at 10:10 AM, Julius, Chad <Chad.Julius at sdstate.edu> wrote:
> 
> As an addendum,
>  
> I did try the suggestion mentioned here as well:
>  
> http://kb.brightcomputing.com/faq/index.php?action=artikel&cat=14&id=410&artlang=en&highlight=slurm
>  
> Chad
>  
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Julius, Chad
> Sent: Thursday, December 13, 2018 8:54 AM
> To: slurm-users at lists.schedmd.com
> Subject: [slurm-users] Help with Con_Res Plugin Error
>  
> Slurm Users, 
>  
> I am hoping that you all can help me with the problem below.
>  
> We just spun up a new cluster using Bright and have been trying to change the default behavior of slurm from linear to con_res.  Should be simple enough but I am plagued by the following error:
>  
> error: we don't have select plugin type 102
>  
> Both the select_linear.so and select_cons_res.so are located in /cm/shared_tmp/apps/slurm/17.11.8/lib64/slurm/
>  
> I have been testing with just the compute nodes and not the GPU nodes etc...  I added the following to my slurm.conf file:
>  
> # Scheduler
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
>  
> # Nodes
> # NodeName=big-mem[001-005],node[001-056]   # Entry from default install
> # NodeName=gpu[001-004]  Gres=gpu:2   # Entry from default install
> NodeName=node[001-056] CPUs=2 RealMemory=196000 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 State=UNKNOWN
>  
>  
> # Partitions
> PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpu[001-004],big-mem[001-005],node[001-056]
> PartitionName=test Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-056]
>  
> When I issue the scontrol reconfigure I get the following:
>  
> [root at thunder ~]# scontrol reconfigure
> slurm_reconfigure error: Unable to contact slurm controller (connect failure)
> [root at thunder ~]# systemctl status slurmctld.service
> ● slurmctld.service - Slurm controller daemon
>    Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor preset: disabled)
>    Active: failed (Result: exit-code) since Thu 2018-12-13 08:46:18 CST; 5s ago
>   Process: 31416 ExecStart=/cm/shared/apps/slurm/17.11.8/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
> Main PID: 31418 (code=exited, status=1/FAILURE)
>  
> When I revert the changes, it goes back to an active working state.
>  
> The /var/log/slurmctld log shows this erorr message:
>  
> error: we don't have select plugin type 102
>  
> Has anyone else run into this problem?  If so, can you recommend a fix?
>  
> Thanks, 
>  
> Chad


::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181213/b4450a89/attachment.html>


More information about the slurm-users mailing list