[slurm-users] /usr/lib64/slurm/prep_script.so: undefined symbol: run_script
Braulio Solano Rojas
braulio at solsoft.biz
Tue Jul 13 02:45:47 UTC 2021
Greetings,
I would like to install SLURM on Clear Linux because of its good
benchmarks. I have followed the tutorial at
https://docs.01.org/clearlinux/latest/tutorials/hpc.html
<https://docs.01.org/clearlinux/latest/tutorials/hpc.html>. When I got
to the step of the section "Create slurm.conf configuration file" I
noticed that slurmctld service didn't start. The error was related to
the slurm.conf file. This was in the log:
jul 11 19:20:00 slurm-controller slurmctld[615]: error: Ignoring
obsolete FastSchedule=1 option. Please remove from your configuration.
jul 11 19:20:00 slurm-controller slurmctld[615]: fatal:
SallocDefaultCommand has been removed. Please consider setting
LaunchParameters=use_interactive_step instead.
I deleted FastSchedule and SallocDefaultCommand. After that I added
these lines:
LaunchParameters=use_interactive_step
InteractiveStepOptions="srun -n1 -N1 --pty --preserve-env --mpi=pmix_v3
$SHELL"
After I corrected that I could not continue because there is an
undefined symbol in a shared object.
This is the log:
[2021-07-11T19:35:14.260] slurmctld version 20.11.8 started on cluster linux
[2021-07-11T19:35:14.261] cred/munge: init: Munge credential signature
plugin loaded
[2021-07-11T19:35:14.262] debug: auth/munge: init: Munge authentication
plugin loaded
[2021-07-11T19:35:14.262] select/cons_res: common_init: select/cons_res
loaded
[2021-07-11T19:35:14.263] select/linear: init: Linear node selection
plugin loaded with argument 1
[2021-07-11T19:35:14.263] select/cons_tres: common_init:
select/cons_tres loaded
[2021-07-11T19:35:14.263] preempt/none: init: preempt/none loaded
[2021-07-11T19:35:14.264] debug: acct_gather_energy/none: init:
AcctGatherEnergy NONE plugin loaded
[2021-07-11T19:35:14.264] debug: acct_gather_Profile/none: init:
AcctGatherProfile NONE plugin loaded
[2021-07-11T19:35:14.264] debug: acct_gather_interconnect/none: init:
AcctGatherInterconnect NONE plugin loaded
[2021-07-11T19:35:14.264] debug: acct_gather_filesystem/none: init:
AcctGatherFilesystem NONE plugin loaded
[2021-07-11T19:35:14.265] debug2: No acct_gather.conf file
(/etc/slurm/acct_gather.conf)
[2021-07-11T19:35:14.265] debug: jobacct_gather/none: init: Job
accounting gather NOT_INVOKED plugin loaded
[2021-07-11T19:35:14.265] error: plugin_load_from_file:
dlopen(/usr/lib64/slurm/prep_script.so):
/usr/lib64/slurm/prep_script.so: undefined symbol: run_script
[2021-07-11T19:35:14.265] error: Couldn't load specified plugin name for
prep/script: Dlopen of plugin file failed
[2021-07-11T19:35:14.266] error: prep_plugin_init: cannot create prep
context for prep/script
[2021-07-11T19:35:14.266] fatal: failed to initialize prep plugin
Since the slurm.conf file of the bundle (package) of Clear Linux is
outdated, I thought that may be using a better configuration file the
error would disappear. My hypothesis was that maybe I needed to load
another plugin that has the run_script symbol. Then, I tried creating a
better configuration file using
https://slurm.schedmd.com/configurator.easy.html. But I got the same
error.
Do you think it is either a bug of SLURM, something missing in the
configuration or an error in the compilation of the bundle (package) I
installed? I have noticed that in other Linux distributions there are
similar issues with precompiled packages. However, it happens with other
shared objects and other symbols.
If the problem is Clear Linux what's the best Linux for SLURM?
I am attaching my latest test configuration file.
I would appreciate any help you may give me. Thank very much in advance.
Best regards,
Braulio J. Solano-Rojas
-------------- next part --------------
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=slurm-controller
#
#MailProg=/bin/mail
MpiDefault=pmix_v3
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/run/slurm/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/run/slurm/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurm/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurm/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=citic-cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
#SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log
#
#
# COMPUTE NODES
NodeName=slurm-worker CPUs=2 Boards=1 SocketsPerBoard=2 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=1968
PartitionName=workers Nodes=slurm-worker Default=YES MaxTime=INFINITE State=UP
PartitionName=debug Nodes=slurm-worker MaxTime=INFINITE State=UP
More information about the slurm-users
mailing list