[slurm-users] fail when trying to set up selection=con_res

Ethan Van Matre vanmatre at ohsu.edu
Wed Nov 29 13:32:33 MST 2017


Here is some more data:

Changed slurm.conf to have


SelectType=select/cons_res

SelectTypeParameters=CR_CPU

Then restarted

 sudo systemctl restart slurmctld.service

The log on the host said:


[2017-11-29T12:23:56.384] error: we don't have select plugin type 101

[2017-11-29T12:23:56.384] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:23:56.384] error: Malformed RPC of type REQUEST_ABORT_JOB(6013) received

[2017-11-29T12:23:56.384] error: slurm_receive_msg_and_forward: Header lengths are longer than data received


Then did a sudo scontrol reconfigure and the log said:


[2017-11-29T12:23:56.394] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:24:34.889] Message aggregation disabled

[2017-11-29T12:24:34.890] Resource spec: Reserved system memory limit not configured for this node

Sview had running jobs cleard out of its context (they are still running) But I kinda expect that.

I then submitted 6 jobs to the partition that do nothing but sleep and the log says:


[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.425] error: we don't have select plugin type 101

[2017-11-29T12:25:39.425] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.425] error: Malformed RPC of type REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.425] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.425] error: we don't have select plugin type 101

[2017-11-29T12:25:39.425] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.425] error: Malformed RPC of type REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.425] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.435] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.435] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.437] error: we don't have select plugin type 101

[2017-11-29T12:25:39.437] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.437] error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.437] error: slurm_receive_msg_and_forward: Header lengths are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.447] error: service_connection: slurm_receive_msg: Header lengths are longer than data received

[2017-11-29T12:25:39.447] error: service_connection: slurm_receive_msg: Header lengths are longer than data received


Lastly changes the config back to linear and restarted reconfigured and the node log says:


[2017-11-29T12:26:19.617] [6684.0] job_manager exiting with aborted job

[2017-11-29T12:26:19.621] [6684.0] done with job

[2017-11-29T12:26:24.591] Message aggregation disabled

[2017-11-29T12:26:24.592] Resource spec: Reserved system memory limit not configured for this node



Ethan VanMatre
Informatics Research Analyst
Institute on Development and Disability
Oregon Health & Science University
CSLU - GH40
3181 SW Sam Jackson Park Rd
Portland, OR 97239
(503) 346-3764
vanmatre at ohsu.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171129/875b0241/attachment-0001.html>


More information about the slurm-users mailing list