[slurm-users] [EXT] Re: slurmdbd does not work
Brian Andrus
toomuchit at gmail.com
Sat Dec 4 00:33:11 UTC 2021
Which version of Mariadb are you using?
Brian Andrus
On 12/3/2021 4:20 PM, Giuseppe G. A. Celano wrote:
> After installation of libmariadb-dev, I have reinstalled the entire
> slurm with ./configure + options, make, and make install. Still,
> accounting_storage_mysql.so is missing.
>
>
>
> On Sat, Dec 4, 2021 at 12:24 AM Sean Crosby <scrosby at unimelb.edu.au>
> wrote:
>
> Did you run
>
> ./configure (with any other options you normally use)
> make
> make install
>
> on your DBD server after you installed the mariadb-devel package?
>
> ------------------------------------------------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on
> behalf of Giuseppe G. A. Celano <giuseppegacelano at gmail.com>
> *Sent:* Saturday, 4 December 2021 10:07
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* [EXT] Re: [slurm-users] slurmdbd does not work
> *
> *External email: *Please exercise caution
>
> *
> ------------------------------------------------------------------------
> The problem is the lack of /usr/lib/slurm/accounting_storage_mysql.so
>
> I have installed many mariadb-related packages, but that file is
> not created by slurm after installation: is there a point in the
> documentation where the installation procedure for the database is
> made explicit?
>
>
>
> On Fri, Dec 3, 2021 at 5:15 PM Brian Andrus <toomuchit at gmail.com>
> wrote:
>
> You will need to also reinstall/restart slurmdbd with the
> updated binary.
>
> Look in the slurmdbd logs to see what is happening there. I
> suspect it had errors updating/creating the database and
> tables. If you have no data in it yet, you can just DROP the
> database and restart slurmdbd.
>
> Brian Andrus
>
> On 12/3/2021 6:42 AM, Giuseppe G. A. Celano wrote:
>> Thanks for the answer, Brian. I now added
>> --with-mysql_config=/etc/mysql/my.cnf, but the problem is
>> still there and now also slurmctld does not work, with the error:
>>
>> [2021-12-03T15:36:41.018] accounting_storage/slurmdbd:
>> clusteracct_storage_p_register_ctld: Registering slurmctld at
>> port 6817 with slurmdbd
>> [2021-12-03T15:36:41.019] error: _conn_readable: persistent
>> connection for fd 9 experienced error[104]: Connection reset
>> by peer
>> [2021-12-03T15:36:41.019] error: _slurm_persist_recv_msg:
>> only read 150 of 2613 bytes
>> [2021-12-03T15:36:41.019] error: Sending PersistInit msg: No
>> error
>> [2021-12-03T15:36:41.020] error: _conn_readable: persistent
>> connection for fd 9 experienced error[104]: Connection reset
>> by peer
>> [2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg:
>> only read 150 of 2613 bytes
>> [2021-12-03T15:36:41.020] error: Sending PersistInit msg: No
>> error
>> [2021-12-03T15:36:41.020] error: _conn_readable: persistent
>> connection for fd 9 experienced error[104]: Connection reset
>> by peer
>> [2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg:
>> only read 150 of 2613 bytes
>> [2021-12-03T15:36:41.020] error: Sending PersistInit msg: No
>> error
>> [2021-12-03T15:36:41.020] error: DBD_GET_TRES failure: No error
>> [2021-12-03T15:36:41.021] error: _conn_readable: persistent
>> connection for fd 9 experienced error[104]: Connection reset
>> by peer
>> [2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg:
>> only read 0 of 2613 bytes
>> [2021-12-03T15:36:41.021] error: Sending PersistInit msg: No
>> error
>> [2021-12-03T15:36:41.021] error: DBD_GET_QOS failure: No error
>> [2021-12-03T15:36:41.021] error: _conn_readable: persistent
>> connection for fd 9 experienced error[104]: Connection reset
>> by peer
>> [2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg:
>> only read 150 of 2613 bytes
>> [2021-12-03T15:36:41.021] error: Sending PersistInit msg: No
>> error
>> [2021-12-03T15:36:41.021] error: DBD_GET_USERS failure: No error
>> [2021-12-03T15:36:41.022] error: _conn_readable: persistent
>> connection for fd 9 experienced error[104]: Connection reset
>> by peer
>> [2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg:
>> only read 0 of 2613 bytes
>> [2021-12-03T15:36:41.022] error: Sending PersistInit msg: No
>> error
>> [2021-12-03T15:36:41.022] error: DBD_GET_ASSOCS failure: No error
>> [2021-12-03T15:36:41.022] error: _conn_readable: persistent
>> connection for fd 9 experienced error[104]: Connection reset
>> by peer
>> [2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg:
>> only read 0 of 2613 bytes
>> [2021-12-03T15:36:41.022] error: Sending PersistInit msg: No
>> error
>> [2021-12-03T15:36:41.022] error: DBD_GET_RES failure: No error
>> [2021-12-03T15:36:41.022] fatal: You are running with a
>> database but for some reason we have no TRES from it. This
>> should only happen if the database is down and you don't have
>> any state files.
>>
>>
>>
>> On Thu, Dec 2, 2021 at 10:36 PM Brian Andrus
>> <toomuchit at gmail.com> wrote:
>>
>>
>> Your slurm needs built with the support. If you have
>> mysql-devel installed it should pick it up, otherwise you
>> can specify the location with --with-mysql when you
>> configure/build slurm
>>
>> Brian Andrus
>>
>> On 12/2/2021 12:40 PM, Giuseppe G. A. Celano wrote:
>>> Hi everyone,
>>>
>>> I am having trouble getting /slurmdbd/ to work. This is
>>> the error I get:
>>>
>>> /error: Couldn't find the specified plugin name for
>>> accounting_storage/mysql looking at all files
>>> error: cannot find accounting_storage plugin for
>>> accounting_storage/mysql
>>> error: cannot create accounting_storage context for
>>> accounting_storage/mysql
>>> fatal: Unable to initialize accounting_storage/mysql
>>> accounting storage plugin/
>>>
>>> I have installed /mysql/ (/apt install mysql/) on Ubuntu
>>> 20.04.03 and followed the instructions on the slurm
>>> website <https://slurm.schedmd.com/accounting.html>;
>>> /mysql/ is running (/port 3306/) and these are the
>>> relevant parts in my /.conf/ files:
>>>
>>> /slurm.conf/
>>>
>>> # LOGGING AND ACCOUNTING
>>> AccountingStorageHost=localhost
>>> AccountingStoragePort=3306
>>> AccountingStorageType=accounting_storage/slurmdbd
>>> AccountingStorageUser=slurm
>>> JobCompType=jobcomp/none
>>> JobAcctGatherFrequency=30
>>> JobAcctGatherType=jobacct_gather/linux
>>> SlurmctldDebug=info
>>> SlurmctldLogFile=/var/log/slurmctld.log
>>> SlurmdDebug=info
>>> SlurmdLogFile=/var/log/slurmd.log
>>>
>>> /slurmdbd.conf/
>>>
>>> AuthType=auth/munge
>>> DbdAddr=localhost
>>> DbdHost=localhost
>>> DbdPort=3306
>>> LogFile=/var/log/slurmdbd.log
>>> PidFile=/var/run/slurmdbd.pid
>>> PluginDir=/usr/lib/slurm
>>> SlurmUser=slurm
>>> StoragePass=password
>>> StorageType=accounting_storage/mysql
>>> StorageUser=slurm
>>> StorageLoc=slurm_acct_db
>>>
>>> I changed the port to 3306 because otherwise /slurmdbd
>>> /could not communicate with /mysql/. If I run /sacct/,
>>> for example, I get:
>>>
>>> /sacct: error: _slurm_persist_recv_msg: read of fd 3
>>> failed: No error
>>> sacct: error: _slurm_persist_recv_msg: only read 126 of
>>> 2616 bytes
>>> sacct: error: slurm_persist_conn_open: No response to
>>> persist_init
>>> sacct: error: Sending PersistInit msg: No error
>>> JobID JobName Partition Account AllocCPUS
>>> State ExitCode
>>> ------------ ---------- ---------- ---------- ----------
>>> ---------- --------
>>> sacct: error: _slurm_persist_recv_msg: read of fd 3
>>> failed: No error
>>> sacct: error: _slurm_persist_recv_msg: only read 126 of
>>> 2616 bytes
>>> sacct: error: Sending PersistInit msg: No error
>>> sacct: error: DBD_GET_JOBS_COND failure: Unspecified error/
>>> /
>>> /
>>> Does anyone have a suggestion to solve this problem?
>>> Thank you very much.
>>>
>>> Best,
>>> Giuseppe
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211203/4e2fc703/attachment-0001.htm>
More information about the slurm-users
mailing list