Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lots of metalock ,mysql crash #768

Open
sumonsun opened this issue Jul 25, 2019 · 3 comments
Open

lots of metalock ,mysql crash #768

sumonsun opened this issue Jul 25, 2019 · 3 comments

Comments

@sumonsun
Copy link

sumonsun commented Jul 25, 2019

report issue

gh-ost: version ,1.0.46,

mysql:version,5.6.28

issue:

Thanks for gh-ost you guys provide, we have do five hundred thousand ddl

Recently ,we got some strange questions ,some times gh-ost ddl crash our master db. (lots of meta lock in db , thread running increase suddenly ,then database was crash)

1. db load was very low,little tps/qps

2. lots of meta lock in processlist snapshot.

We set rename timeout is 1 seconds (cut-over-lock-timeout-seconds=1).

The key point here,gh-ost seems didn't release the lock after lock wait time exceeded

DDL LOG : we could found gh-ost didn't print log about "release the lock"
2019-07-23 22:26:16 INFO Setting RENAME timeout as 1 seconds
2019-07-23 22:26:16 INFO Session renaming tables is 3277189
2019-07-23 22:26:16 INFO Issuing and expecting this to block: rename   table `magi`.`trial_lucky` to `magi`.`_trial_lucky_1563891962_del`, `magi`.`_trial_lucky_1563891962_gho` to `magi`.`trial_lucky`
2019-07-23 22:26:17 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
Copy: 152422/152422 100.0%; Applied: 2; Backlog: 0/1000; Time: 15s(total), 12s(copy); streamer: mysql-bin.000021:256867677; State: migrating; ETA: due
Copy: 152422/152422 100.0%; Applied: 2; Backlog: 0/1000; Time: 16s(total), 12s(copy); streamer: mysql-bin.000021:256868164; State: migrating; ETA: due
Copy: 152422/152422 100.0%; Applied: 2; Backlog: 0/1000; Time: 17s(total), 12s(copy); streamer: mysql-bin.000021:256868651; State: migrating; ETA: due
Copy: 152422/152422 100.0%; Applied: 2; Backlog: 0/1000; Time: 18s(total), 12s(copy); streamer: mysql-bin.000021:256869138; State: migrating; ETA: due
Copy: 152422/152422 100.0%; Applied: 2; Backlog: 0/1000; Time: 19s(total), 12s(copy); streamer: mysql-bin.000021:256869625; State: migrating; ETA: due
[mysql] 2019/07/23 22:27:53 driver.go:81: net.Error from Dial()': dial tcp 172.21.80.17:3306: connect: connection timed out
[mysql] 2019/07/23 22:28:22 packets.go:36: read tcp 172.20.101.133:18424->172.21.80.17:3306: read: connection reset by peer
[mysql] 2019/07/23 22:28:22 packets.go:36: read tcp 172.20.101.133:58982->172.21.80.17:3306: read: connection reset by peer
[mysql] 2019/07/23 22:28:22 packets.go:36: read tcp 172.20.101.133:51922->172.21.80.17:3306: read: connection reset by peer
``
DB PROCESSLIST SNAPSHOT: many metalock in process list
magi', 'Query', 1, 'Waiting for table metadata lock', 
magi', 'Query', 1, 'Waiting for table metadata lock', 
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
'magi', 'Query', 1, 'Waiting for table metadata lock',
magi', 'Query', 1, 'Waiting for table metadata lock', 
'magi', 'Query', 1, 'Waiting for table metadata lock',
@sunkaiqin
Copy link

We have encountered the metadata lock issues in our prod gh-ost cutover. There are helpful links from Percona which could give you the details:
Implications of Metadata Locking Changes in MySQL 5.5
More on (transactional) MySQL metadata locks
Quickly Troubleshoot Metadata Locks in MySQL 5.7

Unfortunately your MySQL version is on 5.6 which doesn't have the performance_schema.metadata_locks table (which is available in MySQL 5.7 and exposes metadata locks details). Utilizing that metadata_locks table you could figure out which query is holding the metadata lock and blocking your gh-ost queries (DDLs). Then you could kill that query if safe to do so to let your gh-ost DDL proceed.

@sumonsun
Copy link
Author

Thanks for your reply ,kaiqin. Let show you my issue

As description by issue #82

  • gh-ost do cutover as below logic :
step1:C10  lock tables tb1 write, tb1_old write;
						step2: C20  set session lock_timeout=1S;
  								   rename table  blocked;
step3:C10  check C20 rename is applied.
      more than 1s did not reflect 
						step4: C20 rename failed due to lock timeout
  • Finally the lock is not released, so we doublt step3 function was dead,then step1 lock didn't released,then lead to lots of metadata

@sunkaiqin
Copy link

Based on the doc the --cut-over-lock-timeout-seconds flag sets the wait_lock_timeout for the lock tables and rename tables commands in the cut-over step. And from your gh-ost logs it timed out at rename step and then connection reset. I have seen timeout during lock table phase then it retried itself again. But I never seen it happen during rename phase which is interesting to know the root cause to me as well. Hope gh-ost team could give some insight on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants