[slurm-users] "Plugin is corrupted" message when using drmaa / debugging libslurm
Chris Samuel
chris at csamuel.org
Tue Jun 28 21:14:50 UTC 2022
On 28/6/22 12:19 pm, Jean-Christophe HAESSIG wrote:
> Hi,
>
> I'm facing a weird issue where launching a job through drmaa
> (https://github.com/natefoo/slurm-drmaa) aborts with the message "Plugin
> is corrupted", but only when that job is placed from one of my compute
> nodes. Running the command from the login node seems to work.
I suspect this is where your error is happening:
https://github.com/SchedMD/slurm/blob/1ce55318222f89fbc862ce559edfd17e911fee38/src/common/plugin.c#L284
it's when it's checking it can load the plugin and not hit any
unresolved library symbols. The fact you are hitting this sounds like
you're missing libraries from the compute nodes that are present on the
login node (or there's some reason they're not getting found if present).
[...]
> Anyway, the message seems to originate from libslurm36 and I would like
> to activate the debug messages (debug3, debug4). Is there a way to do
> this with an environment variable or any other convenient method ?
This depends on what part of Slurm is generating these errors, is this
something like sbatch or srun? If so using multiple -v's will increase
the debug level so you can pick those up. If it's from slurmd then
you'll want to set SlurmdDebug to "debug3" in your slurm.conf.
Once that's done you should get the information on what symbols are not
being found and that should give you some insight into what's going on.
Best of luck,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list