[slurm-users] Issues upgrading db from 20.11.7 -> 21.08.4

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Fri Feb 4 18:36:49 UTC 2022


Hello!

I'm trying to test an upgrade of our production slurm db on a test cluster. Specifically I'm trying to verify a update from 20.11.7 to 21.08.4. I have a dump of the production db, and imported as normal. Then firing up slurmdbd to perform the conversion. I've verified everything I can think of but I'm thinking maybe I'm missing a timeout related mariadb tweak or something to prevent the db from "going away" during the conversion.. See the slurmdbd log below ... I've tried doing the upgrade both ways, via the systemd start script, and manually starting slurmdbd by hand. Anyone run into this before? 

Here are my innodb.conf settings:

[mysqld]
innodb_buffer_pool_size=10000M
innodb_log_file_size=64M
innodb_lock_wait_timeout=10000
max_allowed_packet=16M
net_read_timeout=10000
connect_timeout=10000

===
[root at storm mariadb]# time slurmdbd -D -vvv
slurmdbd: WARNING: MessageTimeout is too high for effective fault-tolerance
slurmdbd: debug:  Log file re-opened
slurmdbd: pidfile not locked, assuming no running daemon
slurmdbd: debug:  auth/munge: init: Munge authentication plugin loaded
slurmdbd: debug2: accounting_storage/as_mysql: init: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 5.5.5-10.3.28-MariaDB
slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_buffer_pool_size: 10737418240
slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_log_file_size: 67108864
slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_lock_wait_timeout: 10000
slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: pre-converting usage table for monsoon
slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_day_table'
alter table "monsoon_usage_day_table" change resv_secs plan_secs bigint unsigned default 0 not null;
slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade.
slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_hour_table'
alter table "monsoon_usage_hour_table" change resv_secs plan_secs bigint unsigned default 0 not null;
slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade.
slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_month_table'
alter table "monsoon_usage_month_table" change resv_secs plan_secs bigint unsigned default 0 not null;
slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade.
slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: pre-converting job table for monsoon
slurmdbd: adding column container after consumed_energy in table "monsoon_step_table"
slurmdbd: adding column submit_line after req_cpufreq_gov in table "monsoon_step_table"
slurmdbd: debug:  Table "monsoon_step_table" has changed.  Updating...
slurmdbd: error: mysql_query failed: 2013 Lost connection to MySQL server during query
alter table "monsoon_step_table" modify `job_db_inx` bigint unsigned not null, modify `deleted` tinyint default 0 not null, modify `exit_code` int default 0 not null, modify `id_step` int not null, modify `step_het_comp` int unsigned default 0xfffffffe not null, modify `kill_requid` int default -1 not null, modify `nodelist` text not null, modify `nodes_alloc` int unsigned not null, modify `node_inx` text, modify `state` smallint unsigned not null, modify `step_name` text not null, modify `task_cnt` int unsigned not null, modify `task_dist` int default 0 not null, modify `time_start` bigint unsigned default 0 not null, modify `time_end` bigint unsigned default 0 not null, modify `time_suspended` bigint unsigned default 0 not null, modify `user_sec` bigint unsigned default 0 not null, modify `user_usec` int unsigned default 0 not null, modify `sys_sec` bigint unsigned default 0 not null, modify `sys_usec` int unsigned default 0 not null, modify `act_cpufreq` double unsigned default 0.0 not null, modify `consumed_energy` bigint unsigned default 0 not null, add `container` text after consumed_energy, modify `req_cpufreq_min` int unsigned default 0 not null, modify `req_cpufreq` int unsigned default 0 not null, modify `req_cpufreq_gov` int unsigned default 0 not null, add `submit_line` text after req_cpufreq_gov, modify `tres_alloc` text not null default '', modify `tres_usage_in_ave` text not null default '', modify `tres_usage_in_max` text not null default '', modify `tres_usage_in_max_taskid` text not null default '', modify `tres_usage_in_max_nodeid` text not null default '', modify `tres_usage_in_min` text not null default '', modify `tres_usage_in_min_taskid` text not null default '', modify `tres_usage_in_min_nodeid` text not null default '', modify `tres_usage_in_tot` text not null default '', modify `tres_usage_out_ave` text not null default '', modify `tres_usage_out_max` text not null default '', modify `tres_usage_out_max_taskid` text not null default '', modify `tres_usage_out_max_nodeid` text not null default '', modify `tres_usage_out_min` text not null default '', modify `tres_usage_out_min_taskid` text not null default '', modify `tres_usage_out_min_nodeid` text not null default '', modify `tres_usage_out_tot` text not null default '', drop primary key, add primary key (job_db_inx, id_step, step_het_comp), drop key no_step_comp, add key no_step_comp (job_db_inx, id_step);
slurmdbd: accounting_storage/as_mysql: init: Accounting storage MYSQL plugin failed
slurmdbd: error: mysql_commit failed: 2006 MySQL server has gone away
slurmdbd: error: rollback failed
slurmdbd: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed
slurmdbd: error: cannot create accounting_storage context for accounting_storage/mysql
slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin

real	11m52.197s
user	0m0.009s
sys	0m0.001s ===
===

Thank you for any ideas!

Best,
Chris
 
-- 
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 
 



More information about the slurm-users mailing list