[slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm
Jean-Christophe HAESSIG
haessigj at igbmc.fr
Wed Jun 29 13:01:54 UTC 2022
On 28/06/2022 23:14, Chris Samuel wrote:
> On 28/6/22 12:19 pm, Jean-Christophe HAESSIG wrote:
Hi,
> I suspect this is where your error is happening:
>
> https://github.com/SchedMD/slurm/blob/1ce55318222f89fbc862ce559edfd17e911fee38/src/common/plugin.c#L284
>
Yes I also found it and that's where I saw the detailed debug3 & debug4
calls.
> it's when it's checking it can load the plugin and not hit any
> unresolved library symbols. The fact you are hitting this sounds like
> you're missing libraries from the compute nodes that are present on the
> login node (or there's some reason they're not getting found if present).
Reading the code it's not 100% clear where these libraries are loaded
from. I think it's all the stuff from
/usr/lib/x86_64-linux-gnu/slurm-wlm/ but everything seems to be there.
Then in turn these libraries have dependencies but I don't know how
libraries could still have undefined symbols one all the dependency
loading/resolution is over.
> This depends on what part of Slurm is generating these errors, is this
> something like sbatch or srun? If so using multiple -v's will increase
> the debug level so you can pick those up. If it's from slurmd then
> you'll want to set SlurmdDebug to "debug3" in your slurm.conf.
No, the job is placed through DRMAA API which enables programs to place
jobs in a cluster-agnostic way. Th program doesn't know it is talking to
Slurm. The DRMAA library makes the translation and loads libslurm36,
where the messages comes from. That's why I don't know how to tell
libslurm to log more, since its use is hidden behind DRMAA.
I both have a test using the Python binding for DRMAA and a test using
pure C which behave the same.
Thank you,
J.C. Haessig
More information about the slurm-users
mailing list