Hi all,
I am having some issue with the new version of slurm 23.11.0-1.
I had already installed and configured slurm 23.02.3-1 on my cluster and all the services were active and running properly.
Following the instructions of the official SLURM webpage, for the moment I upgrated only the slurmdbd service. In principle the cluster should be able to work properly if the slurmdbd has a higher version with respect to slurmctld and slurmd.
Unfortunately the slurmdbd service fails to start with the following status:
slurmdbd.service - Slurm DBD accounting daemon Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; vendor preset: disabled) Active: failed (Result: core-dump) since Wed 2024-02-28 10:05:53 CET; 9min ago Process: 534938 ExecStart=/usr/sbin/slurmdbd -D -s $SLURMDBD_OPTIONS (code=dumped, signal=SEGV) Main PID: 534938 (code=dumped, signal=SEGV)
Feb 28 10:05:53 slurm-db systemd[1]: Started Slurm DBD accounting daemon. Feb 28 10:05:53 slurm-db slurmdbd[534938]: /usr/sbin/slurmdbd: Symbol `slurm_conf' has different size in shared object, consider re-linking Feb 28 10:05:53 slurm-db systemd[1]: slurmdbd.service: Main process exited, code=dumped, status=11/SEGV Feb 28 10:05:53 slurm-db systemd[1]: slurmdbd.service: Failed with result 'core-dump'.
Can anyone help me?
Thanks in advance, Miriam
I see this question unanswered so far.. so I'll give you my 2 cents:
Quick check reveals that mentioned symbol is in libslurmfull.so :
[root@slurmserver2 ~]# nm -gD /usr/lib64/slurm/libslurmfull.so | grep "slurm_conf$" 00000000000d2c06 T free_slurm_conf 00000000000d3345 T init_slurm_conf 000000000041d000 B slurm_conf [root@slurmserver2 ~]#
can not be that this dynamic lib is still the old one?
Depending if you install slurm by rpms, manual in-place build, or something else, the reasons why there is old lib in place may vary..
cheers
josef
On 28. 02. 24 11:16, Miriam Olmi via slurm-users wrote:
`slurm_conf' has different size in shared object, consider re-linking
Hi Josef,
thanks a lot for your reply!
I just checked and you are right!!!
My library comes from the old version of slurm:
$ rpm -q --whatprovides /usr/lib64/slurm/libslurmfull.so slurm-23.02.3-1.el8.x86_64
I installed the new version of slurm 23.11.0-1 by rpm. How can I fix this?
Many thanks in advance again, Miriam
I see this question unanswered so far.. so I'll give you my 2 cents:
Quick check reveals that mentioned symbol is in libslurmfull.so :
[root@slurmserver2 ~]# nm -gD /usr/lib64/slurm/libslurmfull.so | grep "slurm_conf$" 00000000000d2c06 T free_slurm_conf 00000000000d3345 T init_slurm_conf 000000000041d000 B slurm_conf [root@slurmserver2 ~]#
can not be that this dynamic lib is still the old one?
Depending if you install slurm by rpms, manual in-place build, or something else, the reasons why there is old lib in place may vary..
cheers
josef
On 28. 02. 24 11:16, Miriam Olmi via slurm-users wrote:
`slurm_conf' has different size in shared object, consider re-linking
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
I think installing/upgrading "slurm" rpm will replace this shared lib.
Indeed, as always, test it first at not-so-critical system, use vm snapshots to be able to travel back in time ... as once you'll upgrade DB schema (if part of upgrade) you AFAIK can not go back.
josef
On 28. 02. 24 15:51, Miriam Olmi via slurm-users wrote:
I installed the new version of slurm 23.11.0-1 by rpm. How can I fix this?
Dear Josef,
thanks a lot again for your help. Unfortunately I cannot solve this problem.
According to the Slurm documentation (https://slurm.schedmd.com/quickstart_admin.html#upgrade) I have to upgrade only slurmdbd at the very beginning and the cluster should be able to work even with slurmdbd-23.11.0-1 and slurmctld-23.02.3-1 and slurmd-23.02.3-1.
The slurmdbd-23.11.0-1 package should provide the following files:
$ rpm -ql slurm-slurmdbd-23.11.0-1-no-frontend.el8.x86_64.rpm /usr/lib/.build-id /usr/lib/.build-id/01 /usr/lib/.build-id/01/da333fd28f1765164e46d00569ca55e55eb066 /usr/lib/.build-id/e7/4ab5829ee8f5b959cd71d47077cb09fb40fb54 /usr/lib/systemd/system/slurmdbd.service /usr/lib64/slurm/accounting_storage_mysql.so /usr/sbin/slurmdbd
I check on my cluster and all this files are present and coming from slurmdbd-23.11.0-1 as you can see from example for this two files:
$ rpm -q --whatprovides /usr/lib/systemd/system/slurmdbd.service slurm-slurmdbd-23.11.0-1.el8.x86_64 $ rpm -q --whatprovides /usr/lib64/slurm/accounting_storage_mysql.so slurm-slurmdbd-23.11.0-1.el8.x86_64
All the other libraries where the symbol 'slurm_conf' is mentioned are from the other packages: slurm-23.02.3-1.el8.x86_64.rpm, slurm-slurmctld-23.02.3-1.el8.x86_64.rpm, slurm-slurmd-23.02.3-1.el8.x86_64.rpm.
How can I solve this problem now?
Many thanks in advance, Miriam
I think installing/upgrading "slurm" rpm will replace this shared lib.
Indeed, as always, test it first at not-so-critical system, use vm snapshots to be able to travel back in time ... as once you'll upgrade DB schema (if part of upgrade) you AFAIK can not go back.
josef
On 28. 02. 24 15:51, Miriam Olmi via slurm-users wrote:
I installed the new version of slurm 23.11.0-1 by rpm. How can I fix this?
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com