[slurm-users] Issues upgrading db from 20.11.7 -> 21.08.4

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Fri Feb 4 21:07:44 UTC 2022


Hello!

I figured it out it, was a disk space issue. I thought I had checked this already. Please disregard! Thank you!

Best,
Chris
 
-- 
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 
 

On 2/4/22, 11:41 AM, "slurm-users on behalf of Christopher Benjamin Coffey" <slurm-users-bounces at lists.schedmd.com on behalf of Chris.Coffey at nau.edu> wrote:

    Hello!

    I'm trying to test an upgrade of our production slurm db on a test cluster. Specifically I'm trying to verify a update from 20.11.7 to 21.08.4. I have a dump of the production db, and imported as normal. Then firing up slurmdbd to perform the conversion. I've verified everything I can think of but I'm thinking maybe I'm missing a timeout related mariadb tweak or something to prevent the db from "going away" during the conversion.. See the slurmdbd log below ... I've tried doing the upgrade both ways, via the systemd start script, and manually starting slurmdbd by hand. Anyone run into this before? 

    Here are my innodb.conf settings:

    [mysqld]
    innodb_buffer_pool_size=10000M
    innodb_log_file_size=64M
    innodb_lock_wait_timeout=10000
    max_allowed_packet=16M
    net_read_timeout=10000
    connect_timeout=10000

    ===
    [root at storm mariadb]# time slurmdbd -D -vvv
    slurmdbd: WARNING: MessageTimeout is too high for effective fault-tolerance
    slurmdbd: debug:  Log file re-opened
    slurmdbd: pidfile not locked, assuming no running daemon
    slurmdbd: debug:  auth/munge: init: Munge authentication plugin loaded
    slurmdbd: debug2: accounting_storage/as_mysql: init: mysql_connect() called for db slurm_acct_db
    slurmdbd: debug2: Attempting to connect to localhost:3306
    slurmdbd: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 5.5.5-10.3.28-MariaDB
    slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_buffer_pool_size: 10737418240
    slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_log_file_size: 67108864
    slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_lock_wait_timeout: 10000
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: pre-converting usage table for monsoon
    slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_day_table'
    alter table "monsoon_usage_day_table" change resv_secs plan_secs bigint unsigned default 0 not null;
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade.
    slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_hour_table'
    alter table "monsoon_usage_hour_table" change resv_secs plan_secs bigint unsigned default 0 not null;
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade.
    slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_month_table'
    alter table "monsoon_usage_month_table" change resv_secs plan_secs bigint unsigned default 0 not null;
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade.
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: pre-converting job table for monsoon
    slurmdbd: adding column container after consumed_energy in table "monsoon_step_table"
    slurmdbd: adding column submit_line after req_cpufreq_gov in table "monsoon_step_table"
    slurmdbd: debug:  Table "monsoon_step_table" has changed.  Updating...
    slurmdbd: error: mysql_query failed: 2013 Lost connection to MySQL server during query
    alter table "monsoon_step_table" modify `job_db_inx` bigint unsigned not null, modify `deleted` tinyint default 0 not null, modify `exit_code` int default 0 not null, modify `id_step` int not null, modify `step_het_comp` int unsigned default 0xfffffffe not null, modify `kill_requid` int default -1 not null, modify `nodelist` text not null, modify `nodes_alloc` int unsigned not null, modify `node_inx` text, modify `state` smallint unsigned not null, modify `step_name` text not null, modify `task_cnt` int unsigned not null, modify `task_dist` int default 0 not null, modify `time_start` bigint unsigned default 0 not null, modify `time_end` bigint unsigned default 0 not null, modify `time_suspended` bigint unsigned default 0 not null, modify `user_sec` bigint unsigned default 0 not null, modify `user_usec` int unsigned default 0 not null, modify `sys_sec` bigint unsigned default 0 not null, modify `sys_usec` int unsigned default 0 not null, modify `act_cpufreq` double unsigned default 0.0 not null, modify `consumed_energy` bigint unsigned default 0 not null, add `container` text after consumed_energy, modify `req_cpufreq_min` int unsigned default 0 not null, modify `req_cpufreq` int unsigned default 0 not null, modify `req_cpufreq_gov` int unsigned default 0 not null, add `submit_line` text after req_cpufreq_gov, modify `tres_alloc` text not null default '', modify `tres_usage_in_ave` text not null default '', modify `tres_usage_in_max` text not null default '', modify `tres_usage_in_max_taskid` text not null default '', modify `tres_usage_in_max_nodeid` text not null default '', modify `tres_usage_in_min` text not null default '', modify `tres_usage_in_min_taskid` text not null default '', modify `tres_usage_in_min_nodeid` text not null default '', modify `tres_usage_in_tot` text not null default '', modify `tres_usage_out_ave` text not null default '', modify `tres_usage_out_max` text not null default '', modify `tres_usage_out_max_taskid` text not null default '', modify `tres_usage_out_max_nodeid` text not null default '', modify `tres_usage_out_min` text not null default '', modify `tres_usage_out_min_taskid` text not null default '', modify `tres_usage_out_min_nodeid` text not null default '', modify `tres_usage_out_tot` text not null default '', drop primary key, add primary key (job_db_inx, id_step, step_het_comp), drop key no_step_comp, add key no_step_comp (job_db_inx, id_step);
    slurmdbd: accounting_storage/as_mysql: init: Accounting storage MYSQL plugin failed
    slurmdbd: error: mysql_commit failed: 2006 MySQL server has gone away
    slurmdbd: error: rollback failed
    slurmdbd: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed
    slurmdbd: error: cannot create accounting_storage context for accounting_storage/mysql
    slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin

    real	11m52.197s
    user	0m0.009s
    sys	0m0.001s ===
    ===

    Thank you for any ideas!

    Best,
    Chris

    -- 
    Christopher Coffey
    High-Performance Computing
    Northern Arizona University
    928-523-1167






More information about the slurm-users mailing list