[slurm-users] [EXT] Re: slurmdbd does not work

Paul Edmon pedmon at cfa.harvard.edu
Sat Dec 4 01:03:25 UTC 2021


I would check that you have MariaDB-shared installed too on the host you 
build on prior to your build.  The changed the way the packaging is done 
in MariaDB and Slurm needs to detect the files in MariaDB-shared to 
actually trigger the configure to build the mysql libs.

-Paul Edmon-

On 12/3/2021 7:40 PM, Giuseppe G. A. Celano wrote:
> 10.4.22
>
>
> On Sat, Dec 4, 2021 at 1:35 AM Brian Andrus <toomuchit at gmail.com> wrote:
>
>     Which version of Mariadb are you using?
>
>     Brian Andrus
>
>     On 12/3/2021 4:20 PM, Giuseppe G. A. Celano wrote:
>>     After installation of libmariadb-dev, I have reinstalled the
>>     entire slurm with ./configure + options, make, and make install.
>>     Still, accounting_storage_mysql.so is missing.
>>
>>
>>
>>     On Sat, Dec 4, 2021 at 12:24 AM Sean Crosby
>>     <scrosby at unimelb.edu.au> wrote:
>>
>>         Did you run
>>
>>         ./configure (with any other options you normally use)
>>         make
>>         make install
>>
>>         on your DBD server after you installed the mariadb-devel package?
>>
>>         ------------------------------------------------------------------------
>>         *From:* slurm-users <slurm-users-bounces at lists.schedmd.com>
>>         on behalf of Giuseppe G. A. Celano <giuseppegacelano at gmail.com>
>>         *Sent:* Saturday, 4 December 2021 10:07
>>         *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
>>         *Subject:* [EXT] Re: [slurm-users] slurmdbd does not work
>>         *
>>         *External email: *Please exercise caution
>>
>>         *
>>         ------------------------------------------------------------------------
>>         The problem is the lack of
>>         /usr/lib/slurm/accounting_storage_mysql.so
>>
>>         I have installed many mariadb-related packages, but that file
>>         is not created by slurm after installation: is there a point
>>         in the documentation where the installation procedure for the
>>         database is made explicit?
>>
>>
>>
>>         On Fri, Dec 3, 2021 at 5:15 PM Brian Andrus
>>         <toomuchit at gmail.com> wrote:
>>
>>             You will need to also reinstall/restart slurmdbd with the
>>             updated binary.
>>
>>             Look in the slurmdbd logs to see what is happening there.
>>             I suspect it had errors updating/creating the database
>>             and tables. If you have no data in it yet, you can just
>>             DROP the database and restart slurmdbd.
>>
>>             Brian Andrus
>>
>>             On 12/3/2021 6:42 AM, Giuseppe G. A. Celano wrote:
>>>             Thanks for the answer, Brian. I now added
>>>             --with-mysql_config=/etc/mysql/my.cnf, but the problem
>>>             is still there and now also slurmctld does not work,
>>>             with the error:
>>>
>>>             [2021-12-03T15:36:41.018] accounting_storage/slurmdbd:
>>>             clusteracct_storage_p_register_ctld: Registering
>>>             slurmctld at port 6817 with slurmdbd
>>>             [2021-12-03T15:36:41.019] error: _conn_readable:
>>>             persistent connection for fd 9 experienced error[104]:
>>>             Connection reset by peer
>>>             [2021-12-03T15:36:41.019] error:
>>>             _slurm_persist_recv_msg: only read 150 of 2613 bytes
>>>             [2021-12-03T15:36:41.019] error: Sending PersistInit
>>>             msg: No error
>>>             [2021-12-03T15:36:41.020] error: _conn_readable:
>>>             persistent connection for fd 9 experienced error[104]:
>>>             Connection reset by peer
>>>             [2021-12-03T15:36:41.020] error:
>>>             _slurm_persist_recv_msg: only read 150 of 2613 bytes
>>>             [2021-12-03T15:36:41.020] error: Sending PersistInit
>>>             msg: No error
>>>             [2021-12-03T15:36:41.020] error: _conn_readable:
>>>             persistent connection for fd 9 experienced error[104]:
>>>             Connection reset by peer
>>>             [2021-12-03T15:36:41.020] error:
>>>             _slurm_persist_recv_msg: only read 150 of 2613 bytes
>>>             [2021-12-03T15:36:41.020] error: Sending PersistInit
>>>             msg: No error
>>>             [2021-12-03T15:36:41.020] error: DBD_GET_TRES failure:
>>>             No error
>>>             [2021-12-03T15:36:41.021] error: _conn_readable:
>>>             persistent connection for fd 9 experienced error[104]:
>>>             Connection reset by peer
>>>             [2021-12-03T15:36:41.021] error:
>>>             _slurm_persist_recv_msg: only read 0 of 2613 bytes
>>>             [2021-12-03T15:36:41.021] error: Sending PersistInit
>>>             msg: No error
>>>             [2021-12-03T15:36:41.021] error: DBD_GET_QOS failure: No
>>>             error
>>>             [2021-12-03T15:36:41.021] error: _conn_readable:
>>>             persistent connection for fd 9 experienced error[104]:
>>>             Connection reset by peer
>>>             [2021-12-03T15:36:41.021] error:
>>>             _slurm_persist_recv_msg: only read 150 of 2613 bytes
>>>             [2021-12-03T15:36:41.021] error: Sending PersistInit
>>>             msg: No error
>>>             [2021-12-03T15:36:41.021] error: DBD_GET_USERS failure:
>>>             No error
>>>             [2021-12-03T15:36:41.022] error: _conn_readable:
>>>             persistent connection for fd 9 experienced error[104]:
>>>             Connection reset by peer
>>>             [2021-12-03T15:36:41.022] error:
>>>             _slurm_persist_recv_msg: only read 0 of 2613 bytes
>>>             [2021-12-03T15:36:41.022] error: Sending PersistInit
>>>             msg: No error
>>>             [2021-12-03T15:36:41.022] error: DBD_GET_ASSOCS failure:
>>>             No error
>>>             [2021-12-03T15:36:41.022] error: _conn_readable:
>>>             persistent connection for fd 9 experienced error[104]:
>>>             Connection reset by peer
>>>             [2021-12-03T15:36:41.022] error:
>>>             _slurm_persist_recv_msg: only read 0 of 2613 bytes
>>>             [2021-12-03T15:36:41.022] error: Sending PersistInit
>>>             msg: No error
>>>             [2021-12-03T15:36:41.022] error: DBD_GET_RES failure: No
>>>             error
>>>             [2021-12-03T15:36:41.022] fatal: You are running with a
>>>             database but for some reason we have no TRES from it. 
>>>             This should only happen if the database is down and you
>>>             don't have any state files.
>>>
>>>
>>>
>>>             On Thu, Dec 2, 2021 at 10:36 PM Brian Andrus
>>>             <toomuchit at gmail.com> wrote:
>>>
>>>
>>>                 Your slurm needs built with the support. If you have
>>>                 mysql-devel installed it should pick it up,
>>>                 otherwise you can specify the location with
>>>                 --with-mysql when you configure/build slurm
>>>
>>>                 Brian Andrus
>>>
>>>                 On 12/2/2021 12:40 PM, Giuseppe G. A. Celano wrote:
>>>>                 Hi everyone,
>>>>
>>>>                 I am having trouble getting /slurmdbd/ to work.
>>>>                 This is the error I get:
>>>>
>>>>                 /error: Couldn't find the specified plugin name for
>>>>                 accounting_storage/mysql looking at all files
>>>>                 error: cannot find accounting_storage plugin for
>>>>                 accounting_storage/mysql
>>>>                 error: cannot create accounting_storage context for
>>>>                 accounting_storage/mysql
>>>>                 fatal: Unable to initialize
>>>>                 accounting_storage/mysql accounting storage plugin/
>>>>
>>>>                 I have installed /mysql/ (/apt install mysql/) on
>>>>                 Ubuntu 20.04.03 and followed the instructions on
>>>>                 the slurm website
>>>>                 <https://slurm.schedmd.com/accounting.html>;
>>>>                 /mysql/ is running (/port 3306/) and these are the
>>>>                 relevant parts in my /.conf/ files:
>>>>
>>>>                 /slurm.conf/
>>>>
>>>>                 # LOGGING AND ACCOUNTING
>>>>                 AccountingStorageHost=localhost
>>>>                 AccountingStoragePort=3306
>>>>                 AccountingStorageType=accounting_storage/slurmdbd
>>>>                 AccountingStorageUser=slurm
>>>>                 JobCompType=jobcomp/none
>>>>                 JobAcctGatherFrequency=30
>>>>                 JobAcctGatherType=jobacct_gather/linux
>>>>                 SlurmctldDebug=info
>>>>                 SlurmctldLogFile=/var/log/slurmctld.log
>>>>                 SlurmdDebug=info
>>>>                 SlurmdLogFile=/var/log/slurmd.log
>>>>
>>>>                 /slurmdbd.conf/
>>>>
>>>>                 AuthType=auth/munge
>>>>                 DbdAddr=localhost
>>>>                 DbdHost=localhost
>>>>                 DbdPort=3306
>>>>                 LogFile=/var/log/slurmdbd.log
>>>>                 PidFile=/var/run/slurmdbd.pid
>>>>                 PluginDir=/usr/lib/slurm
>>>>                 SlurmUser=slurm
>>>>                 StoragePass=password
>>>>                 StorageType=accounting_storage/mysql
>>>>                 StorageUser=slurm
>>>>                 StorageLoc=slurm_acct_db
>>>>
>>>>                 I changed the port to 3306 because otherwise
>>>>                 /slurmdbd /could not communicate with /mysql/. If I
>>>>                 run /sacct/, for example, I get:
>>>>
>>>>                 /sacct: error: _slurm_persist_recv_msg: read of fd
>>>>                 3 failed: No error
>>>>                 sacct: error: _slurm_persist_recv_msg: only read
>>>>                 126 of 2616 bytes
>>>>                 sacct: error: slurm_persist_conn_open: No response
>>>>                 to persist_init
>>>>                 sacct: error: Sending PersistInit msg: No error
>>>>                 JobID JobName  Partition  Account  AllocCPUS    
>>>>                  State ExitCode
>>>>                 ------------ ---------- ---------- ----------
>>>>                 ---------- ---------- --------
>>>>                 sacct: error: _slurm_persist_recv_msg: read of fd 3
>>>>                 failed: No error
>>>>                 sacct: error: _slurm_persist_recv_msg: only read
>>>>                 126 of 2616 bytes
>>>>                 sacct: error: Sending PersistInit msg: No error
>>>>                 sacct: error: DBD_GET_JOBS_COND failure:
>>>>                 Unspecified error/
>>>>                 /
>>>>                 /
>>>>                 Does anyone have a suggestion to solve this
>>>>                 problem? Thank you very much.
>>>>
>>>>                 Best,
>>>>                 Giuseppe
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211203/e9f5e264/attachment-0001.htm>


More information about the slurm-users mailing list