Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

执行大事务时出现binlog解析失败 #396

Closed
lintanghui opened this issue Jun 26, 2019 · 3 comments · Fixed by #397
Closed

执行大事务时出现binlog解析失败 #396

lintanghui opened this issue Jun 26, 2019 · 3 comments · Fixed by #397

Comments

@lintanghui
Copy link
Contributor

lintanghui commented Jun 26, 2019

现象:Err: table id 267724937: invalid table id, no corresponding table map event"

[2019/06/25 18:04:19] [error] binlogsyncer.go:639 connection was bad
[2019/06/25 18:04:20] [info] binlogsyncer.go:581 begin to re-sync from (mysql-bin.003014, 923556481)
[2019/06/25 18:04:20] [info] binlogsyncer.go:211 register slave for master server 
[2019/06/25 18:04:20] [error] binlogsyncer.go:239 kill connection 2818644069 error ERROR 1094 (HY000): Unknown thread id: 2818644069
[2019/06/25 18:04:20] [info] binlogsyncer.go:245 kill last connection id 2818644069
[2019/06/25 18:04:20] [info] binlogsyncer.go:731 rotate to (mysql-bin.003014, 923556481)
[2019/06/25 18:15:41] [info] sync.go:66 rotate binlog to (mysql-bin.003014, 923556481)

当binlog同步过程中出现网络错误导致重连resync的时候,mysql会向canal发送rorate 事件更新当前的Pos. 如果此时重启了canal。canal将从当前保存的这个位置开始继续同步。但是由于当前这条binlog所对应的table id获取不到,会出现no corresponding table map even 错误。
对应的binlog如下

| mysql-bin.003014 | 906327751 | Table_map      | 341013306 |   906327860 | table_id: 267724937 (bilibili_upcrm.up_base_info)            |
| mysql-bin.003014 | 906327860 | Delete_rows_v1 | 341013306 |   906328894 | table_id: 267724937                                          |
| mysql-bin.003014 | 906328894 | Delete_rows_v1 | 341013306 |   906329854 | table_id: 267724937                                          |
| mysql-bin.003014 | 906329854 | Delete_rows_v1 | 341013306 |   906330834 | table_id: 267724937                                          |
| mysql-bin.003014 | 906330834 | Delete_rows_v1 | 341013306 |   906331808 | table_id: 267724937                                          |
| mysql-bin.003014 | 906331808 | Delete_rows_v1 | 341013306 |   906332738 | table_id: 267724937                                          |
| mysql-bin.003014 | 906332738 | Delete_rows_v1 | 341013306 |   906333714 | table_id: 267724937                                          |
| mysql-bin.003014 | 906333714 | Delete_rows_v1 | 341013306 |   906334692 | table_id: 267724937
... 

由于执行了delete from table 操作,导致产生大量binlog。table map event已经被解析完成,同时由于resync,导致pos被更新到 table map event事件之后。重启之后获取不到table map 就会导致 invalid table id 的报错

@csuzhangxc
Copy link
Contributor

@lintanghui

当binlog同步过程中出现网络错误导致重连resync的时候

when resyncing, BinlogSyncer will use its nextPos (the End_log_pos of the latest received event, mysql-bin.003014, 923556481 in this case) to connection the MySQL master.

当binlog同步过程中出现网络错误导致重连resync的时候,mysql会向canal发送rorate 事件

In MySQL, when a slave connecting to a master, the master will always send a fake RotateEvent (with .Header.Timestamp == 0 and .Header.LogPos == 0) to the salve, the position in this fake RotateEvent can be one of:

  • the position specified in the request from the slave (mysql-bin.003014, 923556481 in this case)
  • the position of the first un-purged binlog file if no position specified in the request

更新当前的Pos. 如果此时重启了canal。canal将从当前保存的这个位置开始继续同步

this is the cause of the problem (or it's a bug). when canal got a RotateEvent from BinlogStreamer, it should check whether the event is a fake RotateEvent, and if that's a fake RotateEvent, it should only update the value in memory but do not save it to persistent storage, so when restarting canal can use the previous correct startup position.

If you like, you can fix this under case *replication.RotateEvent in canel/sync.go

@lintanghui
Copy link
Contributor Author

check if local pos equal to rotate event pos.if equal , may be a fake rotate event, do not update persistent pos and just continue

@lintanghui
Copy link
Contributor Author

may be the same question in this issue #323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants