[slurm-users] job_res_rm_job: plugin still initializing
Loris Bennett
loris.bennett at fu-berlin.de
Tue Feb 7 14:37:33 UTC 2023
Hi,
The other day we updated to 22.05.8. We are interested in using
sharding with our GPUs, so after the update had finished, we
changed
SelectType=select/cons_res
to
SelectType=select/cons_tres
This seemed to cause the slurmctld to loose contact with the
slurmstepds, so that a large number of jobs were requeued, although they
were in fact still running.
The slurmstepds reported
slurmd: error: Malformed RPC of type REQUEST_TERMINATE_JOB(6011) received
slurmd: error: select_g_select_jobinfo_unpack: select plugin cons_tres not found
slurmd: error: select_g_select_jobinfo_unpack: unpack error
In the slurmctld log multiple lines of
select/cons_res: job_res_rm_job: plugin still initializing
occurred. This line also occurs in the following bug report
https://bugs.schedmd.com/show_bug.cgi?id=10980
which is however related to something else, but of the line the SchedMD
employee writes
I don't think this should ever happen.
Has anyone else seen this issue?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
More information about the slurm-users
mailing list