3rd time trying to get this to come through to the list - hopefully this time works.
I've been running SLURM for several years now, but in setting it up on a new cluster, I'm hitting a recurring issue. I'm using a MariaDB and configured it just as I had in my several-year-ago setup and in the docs. There's a "slurm" user (59999) on the OS (Rocky 9), that's on all the nodes, and I've added the slurm@localhost as instructed (grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'PASSWORD'). But, I keep getting things like this:
``` Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: DBD_SEND_MULT_MSG message from invalid uid 59999 Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: Processing last message from connection 7(192.168.1.2) uid(59999) Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: CONN:7 DBD_REGISTER_CTLD message from invalid uid 59999 Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: CONN:7 Security violation, DBD_REGISTER_CTLD Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: Processing last message from connection 7(192.168.1.2) uid(59999) ```
I'm a total SQL noob, but can at least verify that the user is in there: MariaDB [(none)]> SELECT User, Host, Password FROM mysql.user; +-------------+-----------+-------------------------------------------+ | User | Host | Password | +-------------+-----------+-------------------------------------------+ | mariadb.sys | localhost | | | root | localhost | invalid | | mysql | localhost | invalid | | slurm | localhost | *D6665ECF4F3CB12BCA836117F7727B6D0B78D644 | +-------------+-----------+-------------------------------------------+ 4 rows in set (0.002 sec)
Any thoughts as to where I might look to fix this?
Craig
This ticket with SchedMD implies it's a munged issue:
https://bugs.schedmd.com/show_bug.cgi?id=1293
Is the munge daemon running on all systems? If it is, are all servers running a network time daemon such chronyd or ntpd and the time is in sync on all hosts?
Regards --Mick ________________________________ From: slurm-users slurm-users-bounces@lists.schedmd.com on behalf of Craig Stark cestark@ad.uci.edu Sent: Monday, January 8, 2024 1:51 PM To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: Re: [slurm-users] DBD_SEND_MULT_MSG - invalid uid error
3rd time trying to get this to come through to the list - hopefully this time works.
I've been running SLURM for several years now, but in setting it up on a new cluster, I'm hitting a recurring issue. I'm using a MariaDB and configured it just as I had in my several-year-ago setup and in the docs. There's a "slurm" user (59999) on the OS (Rocky 9), that's on all the nodes, and I've added the slurm@localhost as instructed (grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'PASSWORD'). But, I keep getting things like this:
``` Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: DBD_SEND_MULT_MSG message from invalid uid 59999 Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: Processing last message from connection 7(192.168.1.2) uid(59999) Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: CONN:7 DBD_REGISTER_CTLD message from invalid uid 59999 Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: CONN:7 Security violation, DBD_REGISTER_CTLD Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: Processing last message from connection 7(192.168.1.2) uid(59999) ```
I'm a total SQL noob, but can at least verify that the user is in there: MariaDB [(none)]> SELECT User, Host, Password FROM mysql.user; +-------------+-----------+-------------------------------------------+ | User | Host | Password | +-------------+-----------+-------------------------------------------+ | mariadb.sys | localhost | | | root | localhost | invalid | | mysql | localhost | invalid | | slurm | localhost | *D6665ECF4F3CB12BCA836117F7727B6D0B78D644 | +-------------+-----------+-------------------------------------------+ 4 rows in set (0.002 sec)
Any thoughts as to where I might look to fix this?
Craig