[slurm-users] slurmdbd does not work

Brian Andrus toomuchit at gmail.com
Fri Dec 3 16:13:01 UTC 2021


You will need to also reinstall/restart slurmdbd with the updated binary.

Look in the slurmdbd logs to see what is happening there. I suspect it 
had errors updating/creating the database and tables. If you have no 
data in it yet, you can just DROP the database and restart slurmdbd.

Brian Andrus

On 12/3/2021 6:42 AM, Giuseppe G. A. Celano wrote:
> Thanks for the answer, Brian. I now added 
> --with-mysql_config=/etc/mysql/my.cnf, but the problem is still there 
> and now also slurmctld does not work, with the error:
>
> [2021-12-03T15:36:41.018] accounting_storage/slurmdbd: 
> clusteracct_storage_p_register_ctld: Registering slurmctld at port 
> 6817 with slurmdbd
> [2021-12-03T15:36:41.019] error: _conn_readable: persistent connection 
> for fd 9 experienced error[104]: Connection reset by peer
> [2021-12-03T15:36:41.019] error: _slurm_persist_recv_msg: only read 
> 150 of 2613 bytes
> [2021-12-03T15:36:41.019] error: Sending PersistInit msg: No error
> [2021-12-03T15:36:41.020] error: _conn_readable: persistent connection 
> for fd 9 experienced error[104]: Connection reset by peer
> [2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg: only read 
> 150 of 2613 bytes
> [2021-12-03T15:36:41.020] error: Sending PersistInit msg: No error
> [2021-12-03T15:36:41.020] error: _conn_readable: persistent connection 
> for fd 9 experienced error[104]: Connection reset by peer
> [2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg: only read 
> 150 of 2613 bytes
> [2021-12-03T15:36:41.020] error: Sending PersistInit msg: No error
> [2021-12-03T15:36:41.020] error: DBD_GET_TRES failure: No error
> [2021-12-03T15:36:41.021] error: _conn_readable: persistent connection 
> for fd 9 experienced error[104]: Connection reset by peer
> [2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg: only read 0 
> of 2613 bytes
> [2021-12-03T15:36:41.021] error: Sending PersistInit msg: No error
> [2021-12-03T15:36:41.021] error: DBD_GET_QOS failure: No error
> [2021-12-03T15:36:41.021] error: _conn_readable: persistent connection 
> for fd 9 experienced error[104]: Connection reset by peer
> [2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg: only read 
> 150 of 2613 bytes
> [2021-12-03T15:36:41.021] error: Sending PersistInit msg: No error
> [2021-12-03T15:36:41.021] error: DBD_GET_USERS failure: No error
> [2021-12-03T15:36:41.022] error: _conn_readable: persistent connection 
> for fd 9 experienced error[104]: Connection reset by peer
> [2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg: only read 0 
> of 2613 bytes
> [2021-12-03T15:36:41.022] error: Sending PersistInit msg: No error
> [2021-12-03T15:36:41.022] error: DBD_GET_ASSOCS failure: No error
> [2021-12-03T15:36:41.022] error: _conn_readable: persistent connection 
> for fd 9 experienced error[104]: Connection reset by peer
> [2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg: only read 0 
> of 2613 bytes
> [2021-12-03T15:36:41.022] error: Sending PersistInit msg: No error
> [2021-12-03T15:36:41.022] error: DBD_GET_RES failure: No error
> [2021-12-03T15:36:41.022] fatal: You are running with a database but 
> for some reason we have no TRES from it.  This should only happen if 
> the database is down and you don't have any state files.
>
>
>
> On Thu, Dec 2, 2021 at 10:36 PM Brian Andrus <toomuchit at gmail.com> wrote:
>
>
>     Your slurm needs built with the support. If you have mysql-devel
>     installed it should pick it up, otherwise you can specify the
>     location with --with-mysql when you configure/build slurm
>
>     Brian Andrus
>
>     On 12/2/2021 12:40 PM, Giuseppe G. A. Celano wrote:
>>     Hi everyone,
>>
>>     I am having trouble getting /slurmdbd/ to work. This is the error
>>     I get:
>>
>>     /error: Couldn't find the specified plugin name for
>>     accounting_storage/mysql looking at all files
>>     error: cannot find accounting_storage plugin for
>>     accounting_storage/mysql
>>     error: cannot create accounting_storage context for
>>     accounting_storage/mysql
>>     fatal: Unable to initialize accounting_storage/mysql accounting
>>     storage plugin/
>>
>>     I have installed /mysql/ (/apt install mysql/) on Ubuntu 20.04.03
>>     and followed the instructions on the slurm website
>>     <https://slurm.schedmd.com/accounting.html>; /mysql/ is running
>>     (/port 3306/) and these are the relevant parts in my /.conf/ files:
>>
>>     /slurm.conf/
>>
>>     # LOGGING AND ACCOUNTING
>>     AccountingStorageHost=localhost
>>     AccountingStoragePort=3306
>>     AccountingStorageType=accounting_storage/slurmdbd
>>     AccountingStorageUser=slurm
>>     JobCompType=jobcomp/none
>>     JobAcctGatherFrequency=30
>>     JobAcctGatherType=jobacct_gather/linux
>>     SlurmctldDebug=info
>>     SlurmctldLogFile=/var/log/slurmctld.log
>>     SlurmdDebug=info
>>     SlurmdLogFile=/var/log/slurmd.log
>>
>>     /slurmdbd.conf/
>>
>>     AuthType=auth/munge
>>     DbdAddr=localhost
>>     DbdHost=localhost
>>     DbdPort=3306
>>     LogFile=/var/log/slurmdbd.log
>>     PidFile=/var/run/slurmdbd.pid
>>     PluginDir=/usr/lib/slurm
>>     SlurmUser=slurm
>>     StoragePass=password
>>     StorageType=accounting_storage/mysql
>>     StorageUser=slurm
>>     StorageLoc=slurm_acct_db
>>
>>     I changed the port to 3306 because otherwise /slurmdbd /could not
>>     communicate with /mysql/. If I run /sacct/, for example, I get:
>>
>>     /sacct: error: _slurm_persist_recv_msg: read of fd 3 failed: No error
>>     sacct: error: _slurm_persist_recv_msg: only read 126 of 2616 bytes
>>     sacct: error: slurm_persist_conn_open: No response to persist_init
>>     sacct: error: Sending PersistInit msg: No error
>>     JobID           JobName  Partition    Account  AllocCPUS    
>>      State ExitCode
>>     ------------ ---------- ---------- ---------- ----------
>>     ---------- --------
>>     sacct: error: _slurm_persist_recv_msg: read of fd 3 failed: No error
>>     sacct: error: _slurm_persist_recv_msg: only read 126 of 2616 bytes
>>     sacct: error: Sending PersistInit msg: No error
>>     sacct: error: DBD_GET_JOBS_COND failure: Unspecified error/
>>     /
>>     /
>>     Does anyone have a suggestion to solve this problem? Thank you
>>     very much.
>>
>>     Best,
>>     Giuseppe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211203/d02f6a48/attachment.htm>


More information about the slurm-users mailing list