[slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

Jean-Christophe HAESSIG haessigj at igbmc.fr
Wed Jun 29 13:01:54 UTC 2022


On 28/06/2022 23:14, Chris Samuel wrote:
> On 28/6/22 12:19 pm, Jean-Christophe HAESSIG wrote:

Hi,

> I suspect this is where your error is happening:
> 
> https://github.com/SchedMD/slurm/blob/1ce55318222f89fbc862ce559edfd17e911fee38/src/common/plugin.c#L284 
> 

Yes I also found it and that's where I saw the detailed debug3 & debug4 
calls.

> it's when it's checking it can load the plugin and not hit any 
> unresolved library symbols. The fact you are hitting this sounds like 
> you're missing libraries from the compute nodes that are present on the 
> login node (or there's some reason they're not getting found if present).

Reading the code it's not 100% clear where these libraries are loaded 
from. I think it's all the stuff from 
/usr/lib/x86_64-linux-gnu/slurm-wlm/ but everything seems to be there. 
Then in turn these libraries have dependencies but I don't know how 
libraries could still have undefined symbols one all the dependency 
loading/resolution is over.

> This depends on what part of Slurm is generating these errors, is this 
> something like sbatch or srun? If so using multiple -v's will increase 
> the debug level so you can pick those up. If it's from slurmd then 
> you'll want to set SlurmdDebug to "debug3" in your slurm.conf.

No, the job is placed through DRMAA API which enables programs to place 
jobs in a cluster-agnostic way. Th program doesn't know it is talking to 
Slurm. The DRMAA library makes the translation and loads libslurm36, 
where the messages comes from. That's why I don't know how to tell 
libslurm to log more, since its use is hidden behind DRMAA.

I both have a test using the Python binding for DRMAA and a test using 
pure C which behave the same.

Thank you,
J.C. Haessig


More information about the slurm-users mailing list