[slurm-users] slurmdbd upgrade startup error

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Aug 14 12:42:23 MDT 2018


Hi Tina,

Is it the same OS version for 17.02 and 17.11, or are you upgrading the 
OS (and possibly the MySQL/MariaDB) at the same time?  I assume you're 
testing the Slurm upgrade on a test server and not the production cluster?

Did you check the steps mentioned in the thread "slurmdbd: 
mysql/accounting errors on 17.11.5 upgrade" which you initiated on May 7?

/Ole


On 14-08-2018 17:53, Tina Fora wrote:
> I compiled slurm from standard rpmbuild. Upgrading from 17.02 to 17.11.9-2
> is giving the error below. I'm not sure what the issue is with accounting
> storage plugin because it seems to load it ok. On the mysql failed query I
> tried to run it manually and it returns sql syntax error (full error
> below). Works fine on empty database but I need to upgrade our existing.
> How can I debug this further to see what the issue is?
> 
> 
> # slurmdbd -V
> slurm 17.11.9-2
> #
> # slurmdbd -D -vvvv
> slurmdbd: debug:  Log file re-opened
> slurmdbd: debug3: Trying to load plugin /usr/lib64/slurm/auth_munge.so
> slurmdbd: debug:  Munge authentication plugin loaded
> slurmdbd: debug3: Success.
> slurmdbd: debug3: Trying to load plugin
> /usr/lib64/slurm/accounting_storage_mysql.so
> slurmdbd: debug2: mysql_connect() called for db slurmdb
> slurmdbd: pre-converting job table for cluster
> slurmdbd: adding column pack_job_id after id_group in table
> "cluster_job_table"
> slurmdbd: adding column pack_job_offset after pack_job_id in table
> "cluster_job_table"
> slurmdbd: adding column mcs_label after kill_requid in table
> "cluster_job_table"
> slurmdbd: adding column work_dir after wckey in table "cluster_job_table"
> slurmdbd: adding key old_tuple (id_job, id_assoc, time_submit) to table
> "cluster_job_table"
> slurmdbd: adding key pack_job (pack_job_id) to table "cluster_job_table"
> slurmdbd: debug:  Table "cluster_job_table" has changed.  Updating...
> slurmdbd: error: mysql_query failed: 1062 Duplicate entry
> '1042-1012321342' for key 'id_job'
> alter table "cluster_job_table" modify `job_db_inx` bigint unsigned not
> null auto_increment, modify `mod_time` bigint unsigned default 0 not null,
> modify `deleted` tinyint default 0 not null, modify `account` tinytext,
> modify `admin_comment` text, modify `array_task_str` text, modify
> `array_max_tasks` int unsigned default 0 not null, modify
> `array_task_pending` int unsigned default 0 not null, modify `cpus_req`
> int unsigned not null, modify `derived_ec` int unsigned default 0 not
> null, modify `derived_es` text, modify `exit_code` int unsigned default 0
> not null, modify `job_name` tinytext not null, modify `id_assoc` int
> unsigned not null, modify `id_array_job` int unsigned default 0 not null,
> modify `id_array_task` int unsigned default 0xfffffffe not null, modify
> `id_block` tinytext, modify `id_job` int unsigned not null, modify
> `id_qos` int unsigned default 0 not null, modify `id_resv` int unsigned
> not null, modify `id_wckey` int unsigned not null, modify `id_user` int
> unsigned not null, modify `id_group` int unsigned not null, add
> `pack_job_id` int unsigned not null after id_group, add `pack_job_offset`
> int unsigned not null after pack_job_id, modify `kill_requid` int default
> -1 not null, add `mcs_label` tinytext default '' after kill_requid, modify
> `mem_req` bigint unsigned default 0 not null, modify `nodelist` text,
> modify `nodes_alloc` int unsigned not null, modify `node_inx` text, modify
> `partition` tinytext not null, modify `priority` int unsigned not null,
> modify `state` int unsigned not null, modify `timelimit` int unsigned
> default 0 not null, modify `time_submit` bigint unsigned default 0 not
> null, modify `time_eligible` bigint unsigned default 0 not null, modify
> `time_start` bigint unsigned default 0 not null, modify `time_end` bigint
> unsigned default 0 not null, modify `time_suspended` bigint unsigned
> default 0 not null, modify `gres_req` text not null default '', modify
> `gres_alloc` text not null default '', modify `gres_used` text not null
> default '', modify `wckey` tinytext not null default '', add `work_dir`
> text not null default '' after wckey, modify `track_steps` tinyint not
> null, modify `tres_alloc` text not null default '', modify `tres_req` text
> not null default '', drop primary key, add primary key (job_db_inx), drop
> index id_job, add unique index (id_job, time_submit), add key old_tuple
> (id_job, id_assoc, time_submit), drop key rollup, add key rollup
> (time_eligible, time_end), drop key rollup2, add key rollup2 (time_end,
> time_eligible), drop key nodes_alloc, add key nodes_alloc (nodes_alloc),
> drop key wckey, add key wckey (id_wckey), drop key qos, add key qos
> (id_qos), drop key association, add key association (id_assoc), drop key
> array_job, add key array_job (id_array_job), add key pack_job
> (pack_job_id), drop key reserv, add key reserv (id_resv), drop key
> sacct_def, add key sacct_def (id_user, time_start, time_end), drop key
> sacct_def2, add key sacct_def2 (id_user, time_end, time_eligible);
> slurmdbd: Accounting storage MYSQL plugin failed
> slurmdbd: error: Couldn't load specified plugin name for
> accounting_storage/mysql: Plugin init() callback failed
> slurmdbd: error: cannot create accounting_storage context for
> accounting_storage/mysql
> slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting
> storage plugin
> 
> 
> mysql error:
> ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
> that corresponds to your MariaDB server version for the right syntax to
> use near '"cluster_job_table" modify `job_db_inx` bigint unsigned not null
> auto_increment,' at line 1



More information about the slurm-users mailing list