[slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS

Dustin Lang dstndstn at gmail.com
Tue May 5 17:37:59 UTC 2020


Hi,

We're running Slurm 17.11.12.  Everything has been working fine, and then
suddenly slurmctld is crashing and slurmdbd is crashing.

We use fair-share as part of the queuing policy, and previously set up
accounts with sacctmgr; that has been working fine for months.

If I run slurmdbd in debug mode,

 slurmdbd -D -v -v -v -v -v

it eventually (after being contacted by slurmctld) segfaults with:

...
slurmdbd: debug2: DBD_NODE_STATE: NODE:cn049 STATE:UP REASON:(null)
TIME:1588695584
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: DBD_NODE_STATE: NODE:cn050 STATE:UP REASON:(null)
TIME:1588695584
slurmdbd: debug4: got 0 commits
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: DBD_GET_TRES: called
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: DBD_GET_QOS: called
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: DBD_GET_USERS: called
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: DBD_GET_ASSOCS: called
slurmdbd: debug4: 10(as_mysql_assoc.c:2033) query
call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select
@par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos,
@delta_qos;
Segmentation fault (core dumped)


It looks (running slurmdbd in gdb) like that segfault is coming from

https://github.com/SchedMD/slurm/blob/slurm-17-11-12-1/src/plugins/accounting_storage/mysql/as_mysql_assoc.c#L2073

and If I connect to the mysql database directly and call that stored
procedure, I get

mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0);
+---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+
| @par_id := id_assoc | @mj := max_jobs | @msj := max_submit_jobs | @mwpj
:= max_wall_pj | @def_qos_id := def_qos_id | @qos := qos | @delta_qos :=
REPLACE(CONCAT(delta_qos, @delta_qos), ',,', ',') | @mtpj := CONCAT(@mtpj,
if (@mtpj != '' && max_tres_pj != '', ',', ''), max_tres_pj) | @mtpn :=
CONCAT(@mtpn, if (@mtpn != '' && max_tres_pn != '', ',', ''), max_tres_pn)
| @mtmpj := CONCAT(@mtmpj, if (@mtmpj != '' && max_tres_mins_pj != '', ',',
''), max_tres_mins_pj) | @mtrm := CONCAT(@mtrm, if (@mtrm != '' &&
max_tres_run_mins != '', ',', ''), max_tres_run_mins) | @my_acct_new :=
parent_acct |
+---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+
|                   1 |            NULL |                    NULL |
        NULL |                      NULL | ,1,         | NULL
                                             | NULL
                                                       | NULL
                                                                 | NULL

            | NULL
                                   |                             |
+---------------------+-----------------+-------------------------+----------------------+---------------------------+-------------+-----------------------------------------------------------------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+-----------------------------+

and if I run

mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0);
select @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id,
@qos, @delta_qos;

I get

+---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+
| @par_id | @mj  | @msj | @mwpj | @mtpj | @mtpn | @mtmpj | @mtrm |
@def_qos_id | @qos | @delta_qos |
+---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+
|       1 | NULL | NULL |  NULL | NULL  | NULL  | NULL   | NULL  |
 NULL | ,1,  | NULL       |
+---------+------+------+-------+-------+-------+--------+-------+-------------+------+------------+

but I don't know what to do about this.

We use another product ("Bright Cluster Manager") to manage some aspects of
the cluster and Slurm installation, so we are hesitant to just upgrade
Slurm.

I would appreciate any tips.

Thanks,
--dustin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200505/63dae079/attachment.htm>


More information about the slurm-users mailing list