[slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm

Chris Samuel chris at csamuel.org
Tue Jun 28 21:14:50 UTC 2022


On 28/6/22 12:19 pm, Jean-Christophe HAESSIG wrote:

> Hi,
> 
> I'm facing a weird issue where launching a job through drmaa
> (https://github.com/natefoo/slurm-drmaa) aborts with the message "Plugin
> is corrupted", but only when that job is placed from one of my compute
> nodes. Running the command from the login node seems to work.

I suspect this is where your error is happening:

https://github.com/SchedMD/slurm/blob/1ce55318222f89fbc862ce559edfd17e911fee38/src/common/plugin.c#L284

it's when it's checking it can load the plugin and not hit any 
unresolved library symbols. The fact you are hitting this sounds like 
you're missing libraries from the compute nodes that are present on the 
login node (or there's some reason they're not getting found if present).

[...]
> Anyway, the message seems to originate from libslurm36 and I would like
> to activate the debug messages (debug3, debug4). Is there a way to do
> this with an environment variable or any other convenient method ?

This depends on what part of Slurm is generating these errors, is this 
something like sbatch or srun? If so using multiple -v's will increase 
the debug level so you can pick those up. If it's from slurmd then 
you'll want to set SlurmdDebug to "debug3" in your slurm.conf.

Once that's done you should get the information on what symbols are not 
being found and that should give you some insight into what's going on.

Best of luck,
Chris
-- 
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list