发现日志中锁异常
1
[table:[xxxxx]][id:0][time:51008246µs][err:Error 1205: Lock wait timeout exceeded; try restarting transaction]
发现表xxxxx上锁超时,此错误通常由于获取锁时长时间等待造成,最常见的情况是,事务在获取锁后长时间占用不释放,程序中一些异常分支使得事务未提交或未回滚,我们需要定位到这些业务查询上来。
查看数据库状态
1
2
3
4
5mysql> show full processlist;
...
| 567947 | qa | 10.129.144.19:29488 | sample_db | Execute | 29 | update | INSERT INTO xxxxx VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) |
...
// 使用 show engine innodb status\G 也可以连接567947的事务被阻塞,查找阻塞其事务的连接信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48// 查看被阻塞的事务信息
mysql> select
r.trx_isolation_level,
r.trx_id waiting_trx_id,
r.trx_mysql_thread_id waiting_trx_thread,
r.trx_state waiting_trx_state,
lr.lock_mode waiting_trx_lock_mode,
lr.lock_type waiting_trx_lock_type,
lr.lock_table waiting_trx_lock_table,
lr.lock_index waiting_trx_lock_index,
r.trx_query waiting_trx_query,
b.trx_id blocking_trx_id,
b.trx_mysql_thread_id blocking_trx_thread,
b.trx_state blocking_trx_state,
lb.lock_mode blocking_trx_lock_mode,
lb.lock_type blocking_trx_lock_type,
lb.lock_table blocking_trx_lock_table,
lb.lock_index blocking_trx_lock_index,
b.trx_query blocking_query
from information_schema.innodb_lock_waits w
inner join information_schema.innodb_trx b
on b.trx_id=w.blocking_trx_id
inner join information_schema.innodb_trx r
on r.trx_id=w.requesting_trx_id
inner join information_schema.innodb_locks lb
on lb.lock_trx_id=w.blocking_trx_id
inner join information_schema.innodb_locks lr
on lr.lock_trx_id=w.requesting_trx_id
where waiting_trx_thread=567947\G
*************************** 1. row ***************************
trx_isolation_level: REPEATABLE READ
waiting_trx_id: 37973617
waiting_trx_thread: 567947
waiting_trx_state: LOCK WAIT
waiting_trx_lock_mode: X
waiting_trx_lock_type: RECORD
waiting_trx_lock_table: `sample_db`.`xxxx`
waiting_trx_lock_index: idx_itemid
waiting_trx_query: INSERT INTO xxxxxx VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
blocking_trx_id: 37972057
blocking_trx_thread: 567890
blocking_trx_state: RUNNING
blocking_trx_lock_mode: X
blocking_trx_lock_type: RECORD
blocking_trx_lock_table: `sample_db`.`xxx`
blocking_trx_lock_index: idx_itemid
blocking_query: NULL查询一下被当前阻塞连接的信息,阻塞当前查询的连接id为567890
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39mysql> SELECT t.trx_mysql_thread_id
,t.trx_state
,t.trx_tables_in_use
,t.trx_tables_locked
,t.trx_query
,t.trx_rows_locked
,t.trx_rows_modified
,t.trx_lock_structs
,t.trx_started
,t.trx_isolation_level
,p.time
,p.user
,p.host
,p.db
,p.command
FROM information_schema.innodb_trx t
INNER JOIN information_schema.processlist p
ON t.trx_mysql_thread_id = p.id
WHERE t.trx_state = 'RUNNING'
AND p.time > 10
AND p.command = 'Sleep'
AND t.trx_mysql_thread_id=567890\G
*************************** 1. row ***************************
trx_mysql_thread_id: 567890
trx_state: RUNNING
trx_tables_in_use: 0
trx_tables_locked: 1
trx_query: NULL
trx_rows_locked: 1
trx_rows_modified: 0
trx_lock_structs: 3
trx_started: 2019-09-20 16:05:37
trx_isolation_level: REPEATABLE READ
time: 7028
user: qa
host: xx.xx.xx.xx:26078
db: shopee_pcenter_db
command: Sleep查看该进程的执行信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15mysql> select THREAD_ID,
CURRENT_SCHEMA,
LOCK_TIME,
SQL_TEXT
from performance_schema.events_statements_current
where THREAD_ID=567890\G
// 很遗憾没有数据
// 关于performance_schema的简单介绍:
// performance_schema是数据库事件记录库,包含三个表
// . events_waits_current表:记录当前正在执行的等待事件的,每个线程只记录1行记录
// . events_waits_history表:记录已经执行完的最近的等待事件历史,默认每个线程只记录10行记录
// . events_waits_history_long表:记录已经执行完的最近的等待事件历史,默认所有线程的总记录行数为10000行
// 当查询比较复杂时,events_waits_current表可能无法提供有效的信息
或者
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35mysql> SELECT *
FROM sys.session
WHERE CONN_ID =567890\G
*************************** 1. row ***************************
thd_id: 567890
conn_id: 567890
user: xx@xx.xx.xx.xx
db: sample_db
command: Sleep
state: NULL
time: 8247
current_statement: NULL
statement_latency: NULL
progress: NULL
lock_latency: 0 ps
rows_examined: 0
rows_sent: 0
rows_affected: 0
tmp_tables: 0
tmp_disk_tables: 0
full_scan: NO
last_statement: NULL
last_statement_latency: 8.96 us
current_memory: 0 bytes
last_wait: NULL
last_wait_latency: NULL
source: NULL
trx_latency: NULL
trx_state: NULL
trx_autocommit: NULL
pid: NULL
program_name: NULL
1 row in set (0.22 sec)
// 也无可用信息我们查询到当前连接所关联的程序
1
$ lsof -i:26078
如果需要更精确的定位,可以打开mysql通用日志
1
2
3
4
5
6
7
8
9mysql> show variables like '%general_log%'
+------------------+--------------------------------------------------------------+
| Variable_name | Value |
+------------------+--------------------------------------------------------------+
| general_log | OFF |
| general_log_file | /var/lib/mysql/xxxxx.log |
+------------------+--------------------------------------------------------------+
mysql> SET GLOBAL general_log = "ON";
Query OK, 0 rows affected (0.00 sec)当锁异常出现时,我们可以查询日志中阻塞其他查询的连接所进行的操作
1
2
3grep 567890 /var/lib/mysql/xxxxx.log |
2019-09-22T08:48:39.380757Z 567890 Execute SELECT .... FROM ....即可定位故障语句。