0%

数据库锁异常排查

  1. 发现日志中锁异常

    1
    [table:[xxxxx]][id:0][time:51008246µs][err:Error 1205: Lock wait timeout exceeded; try restarting transaction]

    发现表xxxxx上锁超时,此错误通常由于获取锁时长时间等待造成,最常见的情况是,事务在获取锁后长时间占用不释放,程序中一些异常分支使得事务未提交或未回滚,我们需要定位到这些业务查询上来。

  2. 查看数据库状态

    1
    2
    3
    4
    5
    mysql> show full processlist;
    ...
    | 567947 | qa | 10.129.144.19:29488 | sample_db | Execute | 29 | update | INSERT INTO xxxxx VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) |
    ...
    // 使用 show engine innodb status\G 也可以

    连接567947的事务被阻塞,查找阻塞其事务的连接信息

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    // 查看被阻塞的事务信息
    mysql> select
    r.trx_isolation_level,
    r.trx_id waiting_trx_id,
    r.trx_mysql_thread_id waiting_trx_thread,
    r.trx_state waiting_trx_state,
    lr.lock_mode waiting_trx_lock_mode,
    lr.lock_type waiting_trx_lock_type,
    lr.lock_table waiting_trx_lock_table,
    lr.lock_index waiting_trx_lock_index,
    r.trx_query waiting_trx_query,
    b.trx_id blocking_trx_id,
    b.trx_mysql_thread_id blocking_trx_thread,
    b.trx_state blocking_trx_state,
    lb.lock_mode blocking_trx_lock_mode,
    lb.lock_type blocking_trx_lock_type,
    lb.lock_table blocking_trx_lock_table,
    lb.lock_index blocking_trx_lock_index,
    b.trx_query blocking_query
    from information_schema.innodb_lock_waits w
    inner join information_schema.innodb_trx b
    on b.trx_id=w.blocking_trx_id
    inner join information_schema.innodb_trx r
    on r.trx_id=w.requesting_trx_id
    inner join information_schema.innodb_locks lb
    on lb.lock_trx_id=w.blocking_trx_id
    inner join information_schema.innodb_locks lr
    on lr.lock_trx_id=w.requesting_trx_id
    where waiting_trx_thread=567947\G

    *************************** 1. row ***************************
    trx_isolation_level: REPEATABLE READ
    waiting_trx_id: 37973617
    waiting_trx_thread: 567947
    waiting_trx_state: LOCK WAIT
    waiting_trx_lock_mode: X
    waiting_trx_lock_type: RECORD
    waiting_trx_lock_table: `sample_db`.`xxxx`
    waiting_trx_lock_index: idx_itemid
    waiting_trx_query: INSERT INTO xxxxxx VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
    blocking_trx_id: 37972057
    blocking_trx_thread: 567890
    blocking_trx_state: RUNNING
    blocking_trx_lock_mode: X
    blocking_trx_lock_type: RECORD
    blocking_trx_lock_table: `sample_db`.`xxx`
    blocking_trx_lock_index: idx_itemid
    blocking_query: NULL

    查询一下被当前阻塞连接的信息,阻塞当前查询的连接id为567890

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    mysql> SELECT t.trx_mysql_thread_id
    ,t.trx_state
    ,t.trx_tables_in_use
    ,t.trx_tables_locked
    ,t.trx_query
    ,t.trx_rows_locked
    ,t.trx_rows_modified
    ,t.trx_lock_structs
    ,t.trx_started
    ,t.trx_isolation_level
    ,p.time
    ,p.user
    ,p.host
    ,p.db
    ,p.command
    FROM information_schema.innodb_trx t
    INNER JOIN information_schema.processlist p
    ON t.trx_mysql_thread_id = p.id
    WHERE t.trx_state = 'RUNNING'
    AND p.time > 10
    AND p.command = 'Sleep'
    AND t.trx_mysql_thread_id=567890\G

    *************************** 1. row ***************************
    trx_mysql_thread_id: 567890
    trx_state: RUNNING
    trx_tables_in_use: 0
    trx_tables_locked: 1
    trx_query: NULL
    trx_rows_locked: 1
    trx_rows_modified: 0
    trx_lock_structs: 3
    trx_started: 2019-09-20 16:05:37
    trx_isolation_level: REPEATABLE READ
    time: 7028
    user: qa
    host: xx.xx.xx.xx:26078
    db: shopee_pcenter_db
    command: Sleep

    查看该进程的执行信息

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
       mysql> select THREAD_ID, 
    CURRENT_SCHEMA,
    LOCK_TIME,
    SQL_TEXT
    from performance_schema.events_statements_current
    where THREAD_ID=567890\G

    // 很遗憾没有数据
    // 关于performance_schema的简单介绍:
    // performance_schema是数据库事件记录库,包含三个表
    // . events_waits_current表:记录当前正在执行的等待事件的,每个线程只记录1行记录
    // . events_waits_history表:记录已经执行完的最近的等待事件历史,默认每个线程只记录10行记录
    // . events_waits_history_long表:记录已经执行完的最近的等待事件历史,默认所有线程的总记录行数为10000行
    // 当查询比较复杂时,events_waits_current表可能无法提供有效的信息

    或者

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    mysql> SELECT *
    FROM sys.session
    WHERE CONN_ID =567890\G

    *************************** 1. row ***************************
    thd_id: 567890
    conn_id: 567890
    user: xx@xx.xx.xx.xx
    db: sample_db
    command: Sleep
    state: NULL
    time: 8247
    current_statement: NULL
    statement_latency: NULL
    progress: NULL
    lock_latency: 0 ps
    rows_examined: 0
    rows_sent: 0
    rows_affected: 0
    tmp_tables: 0
    tmp_disk_tables: 0
    full_scan: NO
    last_statement: NULL
    last_statement_latency: 8.96 us
    current_memory: 0 bytes
    last_wait: NULL
    last_wait_latency: NULL
    source: NULL
    trx_latency: NULL
    trx_state: NULL
    trx_autocommit: NULL
    pid: NULL
    program_name: NULL
    1 row in set (0.22 sec)
    // 也无可用信息

    我们查询到当前连接所关联的程序

    1
    $ lsof -i:26078
  3. 如果需要更精确的定位,可以打开mysql通用日志

    1
    2
    3
    4
    5
    6
    7
    8
    9
    mysql> show variables like '%general_log%'
    +------------------+--------------------------------------------------------------+
    | Variable_name | Value |
    +------------------+--------------------------------------------------------------+
    | general_log | OFF |
    | general_log_file | /var/lib/mysql/xxxxx.log |
    +------------------+--------------------------------------------------------------+
    mysql> SET GLOBAL general_log = "ON";
    Query OK, 0 rows affected (0.00 sec)

    当锁异常出现时,我们可以查询日志中阻塞其他查询的连接所进行的操作

    1
    2
    3
    grep 567890 /var/lib/mysql/xxxxx.log |

    2019-09-22T08:48:39.380757Z 567890 Execute SELECT .... FROM ....

    即可定位故障语句。