[slurm-users] job_res_rm_job: plugin still initializing

Loris Bennett loris.bennett at fu-berlin.de
Tue Feb 7 14:37:33 UTC 2023


Hi,

The other day we updated to 22.05.8.  We are interested in using
sharding with our GPUs, so after the update had finished, we
changed

  SelectType=select/cons_res

to

  SelectType=select/cons_tres

This seemed to cause the slurmctld to loose contact with the
slurmstepds, so that a large number of jobs were requeued, although they
were in fact still running.

The slurmstepds reported

  slurmd: error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received
  slurmd: error: select_g_select_jobinfo_unpack: select plugin cons_tres not found
  slurmd: error: select_g_select_jobinfo_unpack: unpack error

In the slurmctld log multiple lines of 

  select/cons_res: job_res_rm_job: plugin still initializing
  
occurred.  This line also occurs in the following bug report

  https://bugs.schedmd.com/show_bug.cgi?id=10980

which is however related to something else, but of the line the SchedMD
employee writes

  I don't think this should ever happen.

Has anyone else seen this issue?

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin



More information about the slurm-users mailing list