Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cutover not succeeding #687

Open
JGulbronson opened this issue Dec 6, 2018 · 23 comments
Open

Cutover not succeeding #687

JGulbronson opened this issue Dec 6, 2018 · 23 comments

Comments

@JGulbronson
Copy link

JGulbronson commented Dec 6, 2018

When I try to run a migration to add an index on a relatively small table, the cutover never succeeds. I've run it with verbose output, and get the following:

2018-12-06 14:43:42 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 2; Backlog: 0/1000; Time: 1m35s(total), 39s(copy); streamer: mysql-bin-changelog.002933:13183392; State: migrating; ETA: due
2018-12-06 14:43:48 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:48 INFO Looking for magic cut-over table
2018-12-06 14:43:48 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:48 INFO Dropping magic cut-over table
2018-12-06 14:43:48 INFO Dropping table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:48 INFO Table dropped
2018-12-06 14:43:49 INFO Grabbing voluntary lock: gh-ost.12668.lock
2018-12-06 14:43:49 INFO Setting LOCK timeout as 6 seconds
2018-12-06 14:43:49 INFO Looking for magic cut-over table
2018-12-06 14:43:49 INFO Creating magic cut-over table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:50 INFO Magic cut-over table created
2018-12-06 14:43:50 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m40s(total), 39s(copy); streamer: mysql-bin-changelog.002933:13689223; State: migrating; ETA: due
2018-12-06 14:43:56 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:56 INFO Looking for magic cut-over table
2018-12-06 14:43:56 INFO Dropping magic cut-over table
2018-12-06 14:43:56 INFO Dropping table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:56 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:43:56 INFO Table dropped
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m45s(total), 39s(copy); streamer: mysql-bin-changelog.002933:14037660; State: migrating; ETA: due
2018-12-06 14:43:57 INFO Grabbing voluntary lock: gh-ost.12680.lock
2018-12-06 14:43:57 INFO Setting LOCK timeout as 6 seconds
2018-12-06 14:43:57 INFO Looking for magic cut-over table
2018-12-06 14:43:57 INFO Creating magic cut-over table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:43:57 INFO Magic cut-over table created
2018-12-06 14:43:57 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m50s(total), 39s(copy); streamer: mysql-bin-changelog.002933:14381575; State: migrating; ETA: due
2018-12-06 14:44:03 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:44:03 INFO Looking for magic cut-over table
2018-12-06 14:44:03 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2018-12-06 14:44:03 INFO Dropping magic cut-over table
2018-12-06 14:44:03 INFO Dropping table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:44:03 INFO Table dropped
2018-12-06 14:44:05 INFO Grabbing voluntary lock: gh-ost.12685.lock
2018-12-06 14:44:05 INFO Setting LOCK timeout as 6 seconds
2018-12-06 14:44:05 INFO Looking for magic cut-over table
2018-12-06 14:44:05 INFO Creating magic cut-over table `COMPANY_prod`.`_retailers_del`
2018-12-06 14:44:05 INFO Magic cut-over table created
2018-12-06 14:44:05 INFO Locking `COMPANY_prod`.`retailers`, `COMPANY_prod`.`_retailers_del`
Copy: 91156/91156 100.0%; Applied: 4; Backlog: 0/1000; Time: 1m55s(total), 39s(copy); streamer: mysql-bin-changelog.002933:15039450; State: migrating; ETA: due

It will repeat this ad infinitum, until I kill it after 3-5 minutes. We are using Aurora, but have RBR set and run-on-master. We're able to run other migrations, except for this (relatively) small one, so I'm trying to figure out what possibly could cause this so my debugging can be a bit more guided.

So far, love the tool and the fact I don't have to worry about cleanup after, it's been great for us. Just need to figure out this occasional issue!

@JGulbronson
Copy link
Author

Update, my coworker was able to run the migration no problem after I posted this. Is it possible something wasn't cleaned up properly the first time that prevented subsequent attempts from succeeding?

@ggunson
Copy link
Contributor

ggunson commented Dec 7, 2018

@JGulbronson When gh-ost is in the cut-over process, it attempts (among other things) to get a write lock on the table being migrated; if it cannot (because of other connections' locks on the table) it will time out on that lock wait (hence the Error 1205: Lock wait timeout exceeded; try restarting transaction. By default it times out after a few seconds so that it won't block other activity on the table by continuing to wait; and then retries. See #82

I'd guess there might have been some activity on that table, like a long-running transaction holding locks, that blocked gh-ost from getting that exclusive table lock, though I can't say for sure.

If that happens again I'd suggest running

SHOW FULL PROCESSLIST;
SHOW ENGINE INNODB STATUS\G

And see if there are running transactions holding locks on the table.

@babinomec
Copy link
Contributor

I had the same problem (looping tries on cut over), even on relatively unused tables (along with syncer errors during binlog streaming). All this was resolved by using the latest master version (at the time, commit e48844d)

@peppy
Copy link

peppy commented Feb 14, 2019

I'm seeing this same thing on a mysql 8 host. This is on a table which is very rarely used and also very small (intentional for testing):

2019-02-14 05:31:47 INFO Looking for magic cut-over table
2019-02-14 05:31:47 INFO Dropping magic cut-over table
2019-02-14 05:31:47 INFO Dropping table `osu`.`_phpbb_disallow_del`
2019-02-14 05:31:47 INFO Table dropped
2019-02-14 05:31:48 INFO Grabbing voluntary lock: gh-ost.15579113.lock
2019-02-14 05:31:48 INFO Setting LOCK timeout as 20 seconds
2019-02-14 05:31:48 INFO Looking for magic cut-over table
2019-02-14 05:31:48 INFO Creating magic cut-over table `osu`.`_phpbb_disallow_del`
2019-02-14 05:31:48 INFO Magic cut-over table created
2019-02-14 05:31:48 INFO Locking `osu`.`phpbb_disallow`, `osu`.`_phpbb_disallow_del`
2019-02-14 05:31:48 INFO Tables locked
2019-02-14 05:31:48 INFO Session locking original & magic tables is 15579113
2019-02-14 05:31:48 INFO Writing changelog state: AllEventsUpToLockProcessed:1550122308341602368
2019-02-14 05:31:48 INFO Intercepted changelog state AllEventsUpToLockProcessed
2019-02-14 05:31:48 INFO Handled changelog state AllEventsUpToLockProcessed
2019-02-14 05:31:48 INFO Waiting for events up to lock
2019-02-14 05:31:49 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1550122308341602368
2019-02-14 05:31:49 INFO Done waiting for events up to lock; duration=934.82674ms
# Migrating `osu`.`phpbb_disallow`; Ghost table is `osu`.`_phpbb_disallow_gho`
# Migrating db-master:3306; inspecting db-master:3306; executing on db-master
# Migration started at Thu Feb 14 05:25:55 +0000 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.phpbb_disallow.sock
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 5m53s(total), 1s(copy); streamer: mysql-bin.005307:279157087; State: migrating; ETA: due
2019-02-14 05:31:49 INFO Setting RENAME timeout as 10 seconds
2019-02-14 05:31:49 INFO Session renaming tables is 15579206
2019-02-14 05:31:49 INFO Issuing and expecting this to block: rename /* gh-ost */ table `osu`.`phpbb_disallow` to `osu`.`_phpbb_disallow_del`, `osu`.`_phpbb_disallow_gho` to `osu`.`phpbb_disallow`
2019-02-14 05:31:49 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2019-02-14 05:31:49 INFO Checking session lock: gh-ost.15579113.lock
2019-02-14 05:31:49 INFO Connection holding lock on original table still exists
2019-02-14 05:31:49 INFO Will now proceed to drop magic table and unlock tables
2019-02-14 05:31:49 INFO Dropping magic cut-over table
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 5m55s(total), 1s(copy); streamer: mysql-bin.005307:282332887; State: migrating; ETA: due
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m0s(total), 1s(copy); streamer: mysql-bin.005307:292088677; State: migrating; ETA: due
2019-02-14 05:31:59 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m5s(total), 1s(copy); streamer: mysql-bin.005307:303311125; State: migrating; ETA: due
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m10s(total), 1s(copy); streamer: mysql-bin.005307:313333415; State: migrating; ETA: due
2019-02-14 05:32:09 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2019-02-14 05:32:09 INFO Releasing lock from `osu`.`phpbb_disallow`, `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:09 INFO Tables unlocked
2019-02-14 05:32:09 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2019-02-14 05:32:09 INFO Looking for magic cut-over table
2019-02-14 05:32:09 INFO Dropping magic cut-over table
2019-02-14 05:32:09 INFO Dropping table `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:09 INFO Table dropped
2019-02-14 05:32:10 INFO Grabbing voluntary lock: gh-ost.15579113.lock
2019-02-14 05:32:10 INFO Setting LOCK timeout as 20 seconds
2019-02-14 05:32:10 INFO Looking for magic cut-over table
2019-02-14 05:32:10 INFO Creating magic cut-over table `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:10 INFO Magic cut-over table created
2019-02-14 05:32:10 INFO Locking `osu`.`phpbb_disallow`, `osu`.`_phpbb_disallow_del`
2019-02-14 05:32:10 INFO Tables locked
2019-02-14 05:32:10 INFO Session locking original & magic tables is 15579113
2019-02-14 05:32:10 INFO Writing changelog state: AllEventsUpToLockProcessed:1550122330350015577
2019-02-14 05:32:10 INFO Intercepted changelog state AllEventsUpToLockProcessed
2019-02-14 05:32:10 INFO Handled changelog state AllEventsUpToLockProcessed
2019-02-14 05:32:10 INFO Waiting for events up to lock
Copy: 103/103 100.0%; Applied: 0; Backlog: 1/1000; Time: 6m15s(total), 1s(copy); streamer: mysql-bin.005307:323107210; State: migrating; ETA: due
2019-02-14 05:32:11 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1550122330350015577
2019-02-14 05:32:11 INFO Done waiting for events up to lock; duration=929.661604ms
# Migrating `osu`.`phpbb_disallow`; Ghost table is `osu`.`_phpbb_disallow_gho`
# Migrating db-master:3306; inspecting db-master:3306; executing on db-master
# Migration started at Thu Feb 14 05:25:55 +0000 2019
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.phpbb_disallow.sock
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m15s(total), 1s(copy); streamer: mysql-bin.005307:324603780; State: migrating; ETA: due
2019-02-14 05:32:11 INFO Setting RENAME timeout as 10 seconds
2019-02-14 05:32:11 INFO Session renaming tables is 15579261
2019-02-14 05:32:11 INFO Issuing and expecting this to block: rename /* gh-ost */ table `osu`.`phpbb_disallow` to `osu`.`_phpbb_disallow_del`, `osu`.`_phpbb_disallow_gho` to `osu`.`phpbb_disallow`
2019-02-14 05:32:11 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2019-02-14 05:32:11 INFO Checking session lock: gh-ost.15579113.lock
2019-02-14 05:32:11 INFO Connection holding lock on original table still exists
2019-02-14 05:32:11 INFO Will now proceed to drop magic table and unlock tables
2019-02-14 05:32:11 INFO Dropping magic cut-over table
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m20s(total), 1s(copy); streamer: mysql-bin.005307:334280054; State: migrating; ETA: due
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m25s(total), 1s(copy); streamer: mysql-bin.005307:343985711; State: migrating; ETA: due
2019-02-14 05:32:21 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
Copy: 103/103 100.0%; Applied: 0; Backlog: 0/1000; Time: 6m30s(total), 1s(copy); streamer: mysql-bin.005307:355004881; State: migrating; ETA: due

SHOW ENGINE INNODB STATUS reports nothing of interest.

SHOW PROCESSLIST has just the one RENAME line:

15579113,repl,Query,18,Waiting for column statistics lock,drop /* gh-ost */ table if exists `osu`.`_phpbb_disallow_del`

performance_schema.metadata_locks:

TABLESPACE		osu/_phpbb_disallow_del		139531072737760	INTENTION_EXCLUSIVE	TRANSACTION	GRANTED	lock.cc:793	15580335	1245
TABLESPACE		osu/phpbb_disallow		139531073279360	INTENTION_EXCLUSIVE	TRANSACTION	GRANTED	lock.cc:793	15580335	1245
COLUMN STATISTICS	osu	_phpbb_disallow_del	id	139531073479632	SHARED_READ	STATEMENT	GRANTED	sql_base.cc:544	15580335	1245
BACKUP LOCK				139531074200928	INTENTION_EXCLUSIVE	TRANSACTION	GRANTED	sql_backup_lock.cc:97	15580335	1247
COLUMN STATISTICS	osu	_phpbb_disallow_del	db_roll_ptr	139531074610192	EXCLUSIVE	TRANSACTION	GRANTED	histogram.cc:207	15580335	1247
COLUMN STATISTICS	osu	_phpbb_disallow_del	db_trx_id	139531085308512	EXCLUSIVE	TRANSACTION	GRANTED	histogram.cc:207	15580335	1247
COLUMN STATISTICS	osu	_phpbb_disallow_del	id	139531073337008	EXCLUSIVE	TRANSACTION	PENDING	histogram.cc:207	15580335	1247
GLOBAL				139531073686400	INTENTION_EXCLUSIVE	STATEMENT	GRANTED	sql_base.cc:5345	15580335	1245
SCHEMA	osu			139531072340656	INTENTION_EXCLUSIVE	TRANSACTION	GRANTED	sql_base.cc:5333	15580335	1245
TABLE	osu	_phpbb_disallow_del		139531085562032	SHARED_NO_READ_WRITE	TRANSACTION	GRANTED	sql_parse.cc:5617	15580335	1245
TABLE	osu	phpbb_disallow		139531072699024	SHARED_NO_READ_WRITE	TRANSACTION	GRANTED	sql_parse.cc:5617	15580335	1245
USER LEVEL LOCK		gh-ost.15580295.lock		139531072871808	EXCLUSIVE	EXPLICIT	GRANTED	item_func.cc:4516	15580335	562

@peppy
Copy link

peppy commented Feb 14, 2019

Additionally, manually running the rename works without issue (and runs instantly) outside of the cut-over process.

Also, using --cut-over=two-step works (as one would expect).

@shlomi-noach
Copy link
Contributor

MySQL 8 introduced an atomic rename which is supported while holding table locks. this was developed specifically for gh-ost, and I should start using it...

@drogart
Copy link

drogart commented Feb 15, 2019

I'm seeing the same issue as peppy with 1.0.48, even when running on a test instance of mysql 8.0.11 where the only traffic is from gh-ost itself.

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Feb 17, 2019

If anyone feels like testing #715, please let me know. I do not have a test bed for this, yet.

Please be advised this is experimental; if possible, please first test-on-replica. If you plan on testing on master in production, please make sure you are prepared for risks (I'm hoping this runs smoothly, but since I haven't tested in prod I cannot in good faith claim this works well). Risks may include loss of transactions, lockdown of tables, and beyond.

@drogart
Copy link

drogart commented Feb 20, 2019

I commented on the pull request.

tl;dr: still fails, but I think it's because the 8.0.13 rename behavior requires either all or none of the involved tables to be write locked for our case.

Thanks!

@shlomi-noach
Copy link
Contributor

In an attempt to reproduce and fix the problem, I tested today with MySQL 8.0.18, and -- everything worked as expected?

I issued:

gh-ost \
  --user=gh-ost \
  --password=gh-ost \
  --host=localhost \
  --port=20821 \
  --assume-master-host=localhost:20819 \
  --database=test \
  --table=gh_ost_test \
  --alter='engine=innodb' \
  --exact-rowcount \
  --assume-rbr \
  --initially-drop-old-table \
  --initially-drop-ghost-table \
  --throttle-query='select timestampdiff(second, min(last_update), now()) < 5 from _gh_ost_test_ghc' \
  --serve-socket-file=/tmp/gh-ost.test.sock \
  --initially-drop-socket-file \
  --default-retries=3 \
  --chunk-size=10 \
  --verbose \
  --debug \
  --stack \
  --execute

I got:

2020-02-04 15:59:56 INFO starting gh-ost
2020-02-04 15:59:56 INFO Migrating `test`.`gh_ost_test`
2020-02-04 15:59:56 INFO connection validated on localhost:20821
2020-02-04 15:59:56 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2020-02-04 15:59:56 INFO binary logs validated on localhost:20821
2020-02-04 15:59:56 INFO Inspector initiated on shlomi-oct:20821, version 8.0.18
2020-02-04 15:59:56 INFO Table found. Engine=InnoDB
2020-02-04 15:59:56 DEBUG Estimated number of rows via STATUS: 3
2020-02-04 15:59:56 DEBUG Validated no foreign keys exist on table
2020-02-04 15:59:56 DEBUG Validated no triggers exist on table
2020-02-04 15:59:56 INFO Estimated number of rows via EXPLAIN: 3
2020-02-04 15:59:56 DEBUG Potential unique keys in gh_ost_test: [PRIMARY (auto_increment): [id]; has nullable: false]
2020-02-04 15:59:56 INFO Master forced to be localhost:20819
2020-02-04 15:59:56 INFO log_slave_updates validated on localhost:20821
2020-02-04 15:59:56 INFO connection validated on localhost:20821
2020-02-04 15:59:56 DEBUG Streamer binlog coordinates: mysql-bin.000001:203020
2020-02-04 15:59:56 INFO Connecting binlog streamer at mysql-bin.000001:203020
[2020/02/04 15:59:56] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql localhost 20821 gh-ost    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/02/04 15:59:56] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.000001, 203020)
[2020/02/04 15:59:56] [info] binlogsyncer.go:203 register slave for master server localhost:20821
2020-02-04 15:59:56 DEBUG Beginning streaming
[2020/02/04 15:59:56] [info] binlogsyncer.go:723 rotate to (mysql-bin.000001, 203020)
2020-02-04 15:59:56 INFO rotate to next log from mysql-bin.000001:0 to mysql-bin.000001
2020-02-04 15:59:56 INFO connection validated on localhost:20819
2020-02-04 15:59:56 INFO connection validated on localhost:20819
2020-02-04 15:59:56 INFO will use time_zone='SYSTEM' on applier
2020-02-04 15:59:56 INFO Examining table structure on applier
2020-02-04 15:59:56 INFO Applier initiated on shlomi-oct:20819, version 8.0.18
2020-02-04 15:59:56 INFO Dropping table `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 INFO Table dropped
2020-02-04 15:59:56 INFO Dropping table `test`.`_gh_ost_test_del`
2020-02-04 15:59:56 INFO Table dropped
2020-02-04 15:59:56 INFO Dropping table `test`.`_gh_ost_test_ghc`
2020-02-04 15:59:56 INFO Table dropped
2020-02-04 15:59:56 INFO Creating changelog table `test`.`_gh_ost_test_ghc`
2020-02-04 15:59:56 INFO Changelog table created
2020-02-04 15:59:56 INFO Creating ghost table `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 INFO Ghost table created
2020-02-04 15:59:56 INFO Altering ghost table `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 DEBUG ALTER statement: alter /* gh-ost */ table `test`.`_gh_ost_test_gho` drop column shushu, engine=innodb
2020-02-04 15:59:56 INFO Ghost table altered
2020-02-04 15:59:56 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-02-04 15:59:56 INFO Intercepted changelog state GhostTableMigrated
2020-02-04 15:59:56 INFO Handled changelog state GhostTableMigrated
2020-02-04 15:59:56 DEBUG ghost table migrated
2020-02-04 15:59:56 DEBUG Potential unique keys in _gh_ost_test_gho: [PRIMARY (auto_increment): [id]; has nullable: false]
2020-02-04 15:59:56 INFO Chosen shared unique key is PRIMARY
2020-02-04 15:59:56 INFO Shared columns are id,i,color
2020-02-04 15:59:56 INFO Listening on unix socket file: /tmp/gh-ost.test.sock
2020-02-04 15:59:56 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2020-02-04 15:59:56 DEBUG Reading migration range according to key: PRIMARY
2020-02-04 15:59:56 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2020-02-04 15:59:56 INFO Migration min values: [1]
2020-02-04 15:59:56 DEBUG Reading migration range according to key: PRIMARY
2020-02-04 15:59:56 INFO Migration max values: [3]
2020-02-04 15:59:56 INFO Waiting for first throttle metrics to be collected
2020-02-04 15:59:56 INFO First throttle metrics collected
2020-02-04 15:59:56 DEBUG Operating until row copy is complete
# Migrating `test`.`gh_ost_test`; Ghost table is `test`.`_gh_ost_test_gho`
2020-02-04 15:59:56 DEBUG Getting nothing in the write queue. Sleeping...
# Migrating shlomi-oct:20819; inspecting shlomi-oct:20821; executing on shlomi-oct
# Migration started at Tue Feb 04 15:59:56 +0200 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-query: select timestampdiff(second, min(last_update), now()) < 5 from _gh_ost_test_ghc
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.000001:206450; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.000001:211052; Lag: 0.01s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: mysql-bin.000001:215654; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 3s(total), 3s(copy); streamer: mysql-bin.000001:220252; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 4s(copy); streamer: mysql-bin.000001:224852; Lag: 0.01s, State: throttled, throttle-query; ETA: N/A
Copy: 0/3 0.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 5s(copy); streamer: mysql-bin.000001:229452; Lag: 0.02s, State: throttled, throttle-query; ETA: N/A
2020-02-04 16:00:01 DEBUG Issued INSERT on range: [1]..[3]; iteration: 0; chunk-size: 100
2020-02-04 16:00:01 DEBUG Iteration complete: no further range to iterate
2020-02-04 16:00:01 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-04 16:00:01 INFO Row copy complete
Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 5s(copy); streamer: mysql-bin.000001:231913; Lag: 0.01s, State: migrating; ETA: due
2020-02-04 16:00:01 DEBUG checking for cut-over postpone
2020-02-04 16:00:01 DEBUG checking for cut-over postpone: complete
2020-02-04 16:00:01 INFO Grabbing voluntary lock: gh-ost.51.lock
2020-02-04 16:00:01 INFO Setting LOCK timeout as 6 seconds
2020-02-04 16:00:01 INFO Looking for magic cut-over table
2020-02-04 16:00:01 INFO Creating magic cut-over table `test`.`_gh_ost_test_del`
2020-02-04 16:00:01 INFO Magic cut-over table created
2020-02-04 16:00:01 INFO Locking `test`.`gh_ost_test`, `test`.`_gh_ost_test_del`
2020-02-04 16:00:01 INFO Tables locked
2020-02-04 16:00:01 INFO Session locking original & magic tables is 51
2020-02-04 16:00:01 INFO Writing changelog state: AllEventsUpToLockProcessed:1580824801554808000
2020-02-04 16:00:01 INFO Waiting for events up to lock
2020-02-04 16:00:01 INFO Intercepted changelog state AllEventsUpToLockProcessed
2020-02-04 16:00:01 INFO Handled changelog state AllEventsUpToLockProcessed
Copy: 3/3 100.0%; Applied: 0; Backlog: 1/1000; Time: 6s(total), 5s(copy); streamer: mysql-bin.000001:236706; Lag: 0.02s, State: migrating; ETA: due
2020-02-04 16:00:02 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-04 16:00:02 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1580824801554808000
2020-02-04 16:00:02 INFO Done waiting for events up to lock; duration=989.170997ms
# Migrating `test`.`gh_ost_test`; Ghost table is `test`.`_gh_ost_test_gho`
# Migrating shlomi-oct:20819; inspecting shlomi-oct:20821; executing on shlomi-oct
# Migration started at Tue Feb 04 15:59:56 +0200 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# throttle-query: select timestampdiff(second, min(last_update), now()) < 5 from _gh_ost_test_ghc
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 3/3 100.0%; Applied: 0; Backlog: 0/1000; Time: 6s(total), 5s(copy); streamer: mysql-bin.000001:238424; Lag: 0.02s, State: migrating; ETA: due
2020-02-04 16:00:02 INFO Setting RENAME timeout as 3 seconds
2020-02-04 16:00:02 INFO Session renaming tables is 53
2020-02-04 16:00:02 INFO Issuing and expecting this to block: rename /* gh-ost */ table `test`.`gh_ost_test` to `test`.`_gh_ost_test_del`, `test`.`_gh_ost_test_gho` to `test`.`gh_ost_test`
2020-02-04 16:00:02 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2020-02-04 16:00:02 INFO Checking session lock: gh-ost.51.lock
2020-02-04 16:00:02 INFO Connection holding lock on original table still exists
2020-02-04 16:00:02 INFO Will now proceed to drop magic table and unlock tables
2020-02-04 16:00:02 INFO Dropping magic cut-over table
2020-02-04 16:00:02 INFO Releasing lock from `test`.`gh_ost_test`, `test`.`_gh_ost_test_del`
2020-02-04 16:00:02 INFO Tables unlocked
2020-02-04 16:00:02 INFO Tables renamed
2020-02-04 16:00:02 INFO Lock & rename duration: 1.006560267s. During this time, queries on `gh_ost_test` were blocked
2020-02-04 16:00:02 INFO Looking for magic cut-over table
[2020/02/04 16:00:02] [info] binlogsyncer.go:164 syncer is closing...
2020-02-04 16:00:02 DEBUG done streaming events
2020-02-04 16:00:02 DEBUG Done streaming
[2020/02/04 16:00:02] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
2020-02-04 16:00:02 INFO Closed streamer connection. err=<nil>
[2020/02/04 16:00:02] [info] binlogsyncer.go:179 syncer is closed
2020-02-04 16:00:02 INFO Dropping table `test`.`_gh_ost_test_ghc`
2020-02-04 16:00:02 INFO Table dropped
2020-02-04 16:00:02 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2020-02-04 16:00:02 INFO -- drop table `test`.`_gh_ost_test_del`
2020-02-04 16:00:02 INFO Done migrating `test`.`gh_ost_test`
2020-02-04 16:00:02 INFO Removing socket file: /tmp/gh-ost.test.sock
2020-02-04 16:00:02 INFO Tearing down inspector
2020-02-04 16:00:02 INFO Tearing down applier
2020-02-04 16:00:02 DEBUG Tearing down...
2020-02-04 16:00:02 INFO Tearing down streamer
2020-02-04 16:00:02 INFO Tearing down throttler
2020-02-04 16:00:02 DEBUG Tearing down...
# Done

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Feb 4, 2020

EDIT: this comment does not reflect the actual sequence of steps taken in gh-ost so it';s irrelevant. Keeping for posterity.


I'm confused how the above test worked. If I run the following manually on a 8.0.18 master (assuming I have two tables t1 and t2), I get:

master [localhost:20819] {msandbox} (d1) > lock tables t1 write;
Query OK, 0 rows affected (0.00 sec)

master [localhost:20819] {msandbox} (d1) > rename table t1 to t0, t2 to t1;
ERROR 1100 (HY000): Table 't2' was not locked with LOCK TABLES

@shlomi-noach
Copy link
Contributor

In my comment above (#687 (comment)) I did not reproduce the actual steps gh-ost takes, so it is irrelevant.

@peppy
Copy link

peppy commented Feb 4, 2020

Thanks for looking into this again. I will test against the latest mysql and report back in the coming days.

@shlomi-noach
Copy link
Contributor

Also tried on 8.0.16 and again the master branch just worked.

@peppy
Copy link

peppy commented Feb 5, 2020

I can still reproduce on master (ea339b602380d60921f6e686f29de69d6163c8f0)

root@db:~/gh-ost# mysql --version
mysql  Ver 8.0.16 for Linux on x86_64 (MySQL Community Server - GPL)

Trying using my previous command:

root@db:~/gh-ost# ~/gh-ost/bin/gh-ost --max-load=Threads_running=25 --critical-load=Threads_running=1000 --chunk-size=1000 --throttle-control-replicas="slave_ip" --max-lag-millis=1500 --user="repl" --password="xxxx" --host=master_ip --verbose --exact-rowcount --default-retries=120 --allow-on-master --panic-flag-file=/tmp/ghost.panic.flag --database="osu" --table="osu_errors" --cut-over=two-step --alter="engine=innodb" --initially-drop-old-table --execute
2020-02-05 05:54:52 INFO starting gh-ost ea339b602380d60921f6e686f29de69d6163c8f0
2020-02-05 05:54:52 INFO Migrating `osu`.`osu_errors`
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `osu`.*
2020-02-05 05:54:52 INFO binary logs validated on master_ip:3306
2020-02-05 05:54:52 INFO Restarting replication on master_ip:3306 to make sure binlog settings apply to replication thread
2020-02-05 05:54:52 INFO Inspector initiated on db:3306, version 8.0.16
2020-02-05 05:54:52 INFO Table found. Engine=InnoDB
2020-02-05 05:54:52 INFO Estimated number of rows via EXPLAIN: 44
2020-02-05 05:54:52 INFO Recursively searching for replication master
2020-02-05 05:54:52 INFO Master found to be db:3306
2020-02-05 05:54:52 INFO log_slave_updates validated on master_ip:3306
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO Connecting binlog streamer at mysql-bin.019585:590087919
[2020/02/05 05:54:52] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql master_ip 3306 repl    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/02/05 05:54:52] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.019585, 590087919)
[2020/02/05 05:54:52] [info] binlogsyncer.go:203 register slave for master server master_ip:3306
2020-02-05 05:54:52 INFO rotate to next log from mysql-bin.019585:0 to mysql-bin.019585
[2020/02/05 05:54:52] [info] binlogsyncer.go:723 rotate to (mysql-bin.019585, 590087919)
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO connection validated on master_ip:3306
2020-02-05 05:54:52 INFO will use time_zone='SYSTEM' on applier
2020-02-05 05:54:52 INFO Examining table structure on applier
2020-02-05 05:54:52 INFO Applier initiated on db:3306, version 8.0.16
2020-02-05 05:54:52 INFO Dropping table `osu`.`_osu_errors_del`
2020-02-05 05:54:52 INFO Table dropped
2020-02-05 05:54:52 INFO Dropping table `osu`.`_osu_errors_ghc`
2020-02-05 05:54:52 INFO Table dropped
2020-02-05 05:54:52 INFO Creating changelog table `osu`.`_osu_errors_ghc`
2020-02-05 05:54:52 INFO Changelog table created
2020-02-05 05:54:52 INFO Creating ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:54:52 INFO Ghost table created
2020-02-05 05:54:52 INFO Altering ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:54:53 INFO Ghost table altered
2020-02-05 05:54:53 INFO Intercepted changelog state GhostTableMigrated
2020-02-05 05:54:53 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-02-05 05:54:53 INFO Handled changelog state GhostTableMigrated
2020-02-05 05:54:53 INFO Chosen shared unique key is PRIMARY
2020-02-05 05:54:53 INFO Shared columns are error_id,username,user_id,version,osu_mode,game_mode,game_time,audio_time,culture,beatmap_id,beatmap_checksum,exception,feedback,stacktrace,iltrace,config,date
2020-02-05 05:54:53 INFO Listening on unix socket file: /tmp/gh-ost.osu.osu_errors.sock
2020-02-05 05:54:53 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2020-02-05 05:54:53 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2020-02-05 05:54:53 INFO Migration min values: [2267]
2020-02-05 05:54:53 INFO Migration max values: [2891180]
2020-02-05 05:54:53 INFO Waiting for first throttle metrics to be collected
2020-02-05 05:54:53 INFO First throttle metrics collected
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:54:52 +0000 2020
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.osu_errors.sock
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.019585:590298658; Lag: 0.01s, State: migrating; ETA: N/A
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:590567805; Lag: 0.01s, State: migrating; ETA: N/A
2020-02-05 05:54:54 INFO Row copy complete
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:590711632; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:54:54 INFO Locking `osu`.`osu_errors`
2020-02-05 05:54:54 INFO Table locked
2020-02-05 05:54:54 INFO Writing changelog state: AllEventsUpToLockProcessed:1580882094240375857
2020-02-05 05:54:54 INFO Intercepted changelog state AllEventsUpToLockProcessed
2020-02-05 05:54:54 INFO Handled changelog state AllEventsUpToLockProcessed
2020-02-05 05:54:54 INFO Waiting for events up to lock
Copy: 84/84 100.0%; Applied: 0; Backlog: 1/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:591293960; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:54:55 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1580882094240375857
2020-02-05 05:54:55 INFO Done waiting for events up to lock; duration=998.657843ms
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:54:52 +0000 2020
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.osu.osu_errors.sock
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:591313049; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:54:55 INFO Renaming original table
2020-02-05 05:54:55 INFO Renaming ghost table
2020-02-05 05:54:55 INFO Tables renamed
2020-02-05 05:54:55 INFO Unlocking tables
2020-02-05 05:54:55 INFO Tables unlocked
[2020/02/05 05:54:55] [info] binlogsyncer.go:164 syncer is closing...
2020-02-05 05:54:55 INFO Closed streamer connection. err=<nil>
2020-02-05 05:54:55 INFO Dropping table `osu`.`_osu_errors_ghc`
[2020/02/05 05:54:55] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
[2020/02/05 05:54:55] [info] binlogsyncer.go:179 syncer is closed
2020-02-05 05:54:55 INFO Table dropped
2020-02-05 05:54:55 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2020-02-05 05:54:55 INFO -- drop table `osu`.`_osu_errors_del`
2020-02-05 05:54:55 INFO Done migrating `osu`.`osu_errors`
2020-02-05 05:54:55 INFO Removing socket file: /tmp/gh-ost.osu.osu_errors.sock
2020-02-05 05:54:55 INFO Tearing down inspector
2020-02-05 05:54:55 INFO Tearing down applier
2020-02-05 05:54:55 INFO Tearing down streamer
2020-02-05 05:54:55 INFO Tearing down throttler
# Done

trying using your command above (modified to work with my setup, most important change to note is the addition of --allow-on-master:

root@db:~/gh-ost# ~/gh-ost/bin/gh-ost   --throttle-control-replicas="slave_ip" --max-lag-millis=1500 --user="repl" --password="xxxx" --host=master_ip   --database=osu   --allow-on-master   --table=osu_errors   --alter='engine=innodb'   --exact-rowcount   --assume-rbr   --initially-drop-old-table   --initially-drop-ghost-table   --serve-socket-file=/tmp/gh-ost.test.sock   --initially-drop-socket-file   --default-retries=3   --chunk-size=10   --verbose   --debug   --stack   --execute
2020-02-05 05:55:26 INFO starting gh-ost ea339b602380d60921f6e686f29de69d6163c8f0
2020-02-05 05:55:26 INFO Migrating `osu`.`osu_errors`
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `osu`.*
2020-02-05 05:55:26 INFO binary logs validated on master_ip:3306
2020-02-05 05:55:26 INFO Inspector initiated on db:3306, version 8.0.16
2020-02-05 05:55:26 INFO Table found. Engine=InnoDB
2020-02-05 05:55:26 DEBUG Estimated number of rows via STATUS: 35
2020-02-05 05:55:26 DEBUG Validated no foreign keys exist on table
2020-02-05 05:55:26 DEBUG Validated no triggers exist on table
2020-02-05 05:55:26 INFO Estimated number of rows via EXPLAIN: 44
2020-02-05 05:55:26 DEBUG Potential unique keys in osu_errors: [PRIMARY (auto_increment): [error_id]; has nullable: false]
2020-02-05 05:55:26 INFO Recursively searching for replication master
2020-02-05 05:55:26 DEBUG Looking for master on master_ip:3306
2020-02-05 05:55:26 INFO Master found to be db:3306
2020-02-05 05:55:26 INFO log_slave_updates validated on master_ip:3306
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 DEBUG Streamer binlog coordinates: mysql-bin.019585:600428610
2020-02-05 05:55:26 INFO Connecting binlog streamer at mysql-bin.019585:600428610
[2020/02/05 05:55:26] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql master_ip 3306 repl    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/02/05 05:55:26] [info] binlogsyncer.go:354 begin to sync binlog from position (mysql-bin.019585, 600428610)
[2020/02/05 05:55:26] [info] binlogsyncer.go:203 register slave for master server master_ip:3306
2020-02-05 05:55:26 DEBUG Beginning streaming
[2020/02/05 05:55:26] [info] binlogsyncer.go:723 rotate to (mysql-bin.019585, 600428610)
2020-02-05 05:55:26 INFO rotate to next log from mysql-bin.019585:0 to mysql-bin.019585
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 INFO connection validated on master_ip:3306
2020-02-05 05:55:26 INFO will use time_zone='SYSTEM' on applier
2020-02-05 05:55:26 INFO Examining table structure on applier
2020-02-05 05:55:26 INFO Applier initiated on db:3306, version 8.0.16
2020-02-05 05:55:26 INFO Dropping table `osu`.`_osu_errors_gho`
2020-02-05 05:55:26 INFO Table dropped
2020-02-05 05:55:26 INFO Dropping table `osu`.`_osu_errors_del`
2020-02-05 05:55:26 INFO Table dropped
2020-02-05 05:55:26 INFO Dropping table `osu`.`_osu_errors_ghc`
2020-02-05 05:55:26 INFO Table dropped
2020-02-05 05:55:26 INFO Creating changelog table `osu`.`_osu_errors_ghc`
2020-02-05 05:55:26 INFO Changelog table created
2020-02-05 05:55:26 INFO Creating ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:55:26 INFO Ghost table created
2020-02-05 05:55:26 INFO Altering ghost table `osu`.`_osu_errors_gho`
2020-02-05 05:55:26 DEBUG ALTER statement: alter /* gh-ost */ table `osu`.`_osu_errors_gho` engine=innodb
2020-02-05 05:55:27 INFO Ghost table altered
2020-02-05 05:55:27 INFO Intercepted changelog state GhostTableMigrated
2020-02-05 05:55:27 INFO Waiting for ghost table to be migrated. Current lag is 0s
2020-02-05 05:55:27 DEBUG ghost table migrated
2020-02-05 05:55:27 INFO Handled changelog state GhostTableMigrated
2020-02-05 05:55:27 DEBUG Potential unique keys in _osu_errors_gho: [PRIMARY (auto_increment): [error_id]; has nullable: false]
2020-02-05 05:55:27 INFO Chosen shared unique key is PRIMARY
2020-02-05 05:55:27 INFO Shared columns are error_id,username,user_id,version,osu_mode,game_mode,game_time,audio_time,culture,beatmap_id,beatmap_checksum,exception,feedback,stacktrace,iltrace,config,date
2020-02-05 05:55:27 INFO Listening on unix socket file: /tmp/gh-ost.test.sock
2020-02-05 05:55:27 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2020-02-05 05:55:27 DEBUG Reading migration range according to key: PRIMARY
2020-02-05 05:55:27 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2020-02-05 05:55:27 INFO Migration min values: [2267]
2020-02-05 05:55:27 DEBUG Reading migration range according to key: PRIMARY
2020-02-05 05:55:27 INFO Migration max values: [2891180]
2020-02-05 05:55:27 INFO Waiting for first throttle metrics to be collected
2020-02-05 05:55:27 INFO First throttle metrics collected
2020-02-05 05:55:27 DEBUG Operating until row copy is complete
2020-02-05 05:55:27 DEBUG Getting nothing in the write queue. Sleeping...
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:55:26 +0000 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.019585:600638981; Lag: 0.01s, State: migrating; ETA: N/A
Copy: 0/44 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:600946019; Lag: 0.01s, State: migrating; ETA: N/A
2020-02-05 05:55:28 DEBUG Issued INSERT on range: [2267]..[2891180]; iteration: 0; chunk-size: 100
2020-02-05 05:55:28 DEBUG Iteration complete: no further range to iterate
2020-02-05 05:55:28 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:28 INFO Row copy complete
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.019585:601265342; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:28 DEBUG checking for cut-over postpone
2020-02-05 05:55:28 DEBUG checking for cut-over postpone: complete
2020-02-05 05:55:28 INFO Grabbing voluntary lock: gh-ost.49366677.lock
2020-02-05 05:55:28 INFO Setting LOCK timeout as 6 seconds
2020-02-05 05:55:28 INFO Looking for magic cut-over table
2020-02-05 05:55:28 INFO Creating magic cut-over table `osu`.`_osu_errors_del`
2020-02-05 05:55:28 INFO Magic cut-over table created
2020-02-05 05:55:28 INFO Locking `osu`.`osu_errors`, `osu`.`_osu_errors_del`
2020-02-05 05:55:28 INFO Tables locked
2020-02-05 05:55:28 INFO Session locking original & magic tables is 49366677
2020-02-05 05:55:28 INFO Writing changelog state: AllEventsUpToLockProcessed:1580882128472911106
2020-02-05 05:55:28 INFO Intercepted changelog state AllEventsUpToLockProcessed
2020-02-05 05:55:28 INFO Handled changelog state AllEventsUpToLockProcessed
2020-02-05 05:55:28 INFO Waiting for events up to lock
Copy: 84/84 100.0%; Applied: 0; Backlog: 1/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:601740149; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:29 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:29 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1580882128472911106
2020-02-05 05:55:29 INFO Done waiting for events up to lock; duration=974.286376ms
# Migrating `osu`.`osu_errors`; Ghost table is `osu`.`_osu_errors_gho`
# Migrating db:3306; inspecting db:3306; executing on db
# Migration started at Wed Feb 05 05:55:26 +0000 2020
# chunk-size: 100; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# throttle-control-replicas count: 1
# Serving on unix socket: /tmp/gh-ost.test.sock
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.019585:601746558; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:29 INFO Setting RENAME timeout as 3 seconds
2020-02-05 05:55:29 INFO Session renaming tables is 49366683
2020-02-05 05:55:29 INFO Issuing and expecting this to block: rename /* gh-ost */ table `osu`.`osu_errors` to `osu`.`_osu_errors_del`, `osu`.`_osu_errors_gho` to `osu`.`osu_errors`
2020-02-05 05:55:29 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2020-02-05 05:55:29 INFO Checking session lock: gh-ost.49366677.lock
2020-02-05 05:55:29 INFO Connection holding lock on original table still exists
2020-02-05 05:55:29 INFO Will now proceed to drop magic table and unlock tables
2020-02-05 05:55:29 INFO Dropping magic cut-over table
2020-02-05 05:55:29 INFO Releasing lock from `osu`.`osu_errors`, `osu`.`_osu_errors_del`
2020-02-05 05:55:29 INFO Tables unlocked
2020-02-05 05:55:29 INFO Tables renamed
2020-02-05 05:55:29 INFO Lock & rename duration: 1.145875074s. During this time, queries on `osu_errors` were blocked
2020-02-05 05:55:29 INFO Looking for magic cut-over table
[2020/02/05 05:55:29] [info] binlogsyncer.go:164 syncer is closing...
2020-02-05 05:55:29 INFO Closed streamer connection. err=<nil>
2020-02-05 05:55:29 INFO Dropping table `osu`.`_osu_errors_ghc`
2020-02-05 05:55:29 DEBUG Done streaming
[2020/02/05 05:55:29] [error] binlogstreamer.go:77 close sync with err: sync is been closing...
[2020/02/05 05:55:29] [info] binlogsyncer.go:179 syncer is closed
2020-02-05 05:55:30 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:31 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:32 DEBUG Getting nothing in the write queue. Sleeping...
2020-02-05 05:55:32 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
goroutine 83 [running]:
runtime/debug.Stack(0x5c, 0x100, 0xc000106230)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.logErrorEntry(0x2, 0x8a5f80, 0xc00040dcc0, 0xc0002b23c0, 0xba)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:178 +0xe7
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.Errore(...)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:224
github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils.ExecNoPrepare(0xc00017a000, 0xc0002b23c0, 0xba, 0xc0000fbea0, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils/sqlutils.go:283 +0x185
github.com/github/gh-ost/go/logic.(*Applier).WriteChangelog(0xc0001fc2a0, 0x8109d6, 0x9, 0xc0001fb660, 0x1e, 0xc0001fb660, 0x1e, 0xc0000fbf08, 0x67b203)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:292 +0x2d0
github.com/github/gh-ost/go/logic.(*Applier).InitiateHeartbeat.func1(0xc000144000, 0xc0000fbf00)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:313 +0x9d
github.com/github/gh-ost/go/logic.(*Applier).InitiateHeartbeat(0xc0001fc2a0)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:336 +0x122
created by github.com/github/gh-ost/go/logic.(*Migrator).initiateApplier
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/migrator.go:1074 +0x21e
2020-02-05 05:55:32 ERROR Error 1146: Table 'osu._osu_errors_ghc' doesn't exist
goroutine 165 [running]:
runtime/debug.Stack(0x4f, 0x200, 0xc000126300)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.logErrorEntry(0x2, 0x8a5f80, 0xc00040dd20, 0xc00016a240, 0xba)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:178 +0xe7
github.com/github/gh-ost/vendor/github.com/outbrain/golib/log.Errore(...)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/log/log.go:224
github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils.ExecNoPrepare(0xc00017a000, 0xc00016a240, 0xba, 0xc000504cd8, 0x3, 0x3, 0x0, 0x0, 0x0, 0x0)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/vendor/github.com/outbrain/golib/sqlutils/sqlutils.go:283 +0x185
github.com/github/gh-ost/go/logic.(*Applier).WriteChangelog(0xc0001fc2a0, 0xc0004e2de0, 0x1e, 0xc00036c140, 0x98, 0xc0004e2de0, 0x1e, 0x1, 0xc000164680)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/applier.go:292 +0x2d0
github.com/github/gh-ost/go/logic.(*Migrator).printStatus(0xc000108120, 0x1, 0xc0000ad950, 0x1, 0x1)
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/migrator.go:973 +0x8db
created by github.com/github/gh-ost/go/logic.(*Migrator).initiateStatus
        /root/gh-ost/.gopath/src/github.com/github/gh-ost/go/logic/migrator.go:785 +0xab
Copy: 84/84 100.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 1s(copy); streamer: mysql-bin.019585:601777417; Lag: 0.01s, State: migrating; ETA: due
2020-02-05 05:55:32 INFO Table dropped
2020-02-05 05:55:32 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2020-02-05 05:55:32 INFO -- drop table `osu`.`_osu_errors_del`
2020-02-05 05:55:32 INFO Done migrating `osu`.`osu_errors`
2020-02-05 05:55:32 INFO Removing socket file: /tmp/gh-ost.test.sock
2020-02-05 05:55:32 INFO Tearing down inspector
2020-02-05 05:55:32 INFO Tearing down applier
2020-02-05 05:55:32 DEBUG Tearing down...
2020-02-05 05:55:32 INFO Tearing down streamer
2020-02-05 05:55:32 INFO Tearing down throttler
2020-02-05 05:55:32 DEBUG Tearing down...
# Done

Please let me know if I can provide any further details to help further the investigation.

@shlomi-noach
Copy link
Contributor

@peppy it seems like your migration worked after one retry?

But, as I look into this issue, there's actually multiple cases reported by multiple people and this got me confused. I think I'm working on something other than the issue you presented 😬

@shlomi-noach
Copy link
Contributor

@peppy so what I'm trying to see is whether gh-ost works with MySQL 8.0, as per #715 (where you also participate). I'll continue the discussion there.

@peppy
Copy link

peppy commented Feb 5, 2020

Aha, right.

And yes, the first one did work (with a weird error in the middle though), as mentioned earlier in this thread, because i used cut-over=two-step.

@peppy
Copy link

peppy commented Feb 5, 2020

Actually on closer inspection it looks like both may have completed, if one is to ignore the error messages and stack traces.

@shlomi-noach
Copy link
Contributor

if one is to ignore the error messages and stack traces.

Y es, I did mean the 2nd one worked, sorry for not being clear. Seems like retries were successful. Now, this doesn't mean it will always work; perhaps some workload will cause an infinite amount of retries -- not sure.
At any case, what I was working on is irrelevant to this issue, so my past few comments can just be ignored 😛

@jianhaiqing
Copy link

jianhaiqing commented Feb 23, 2021

As far as I know, 8.0.15 doesn't work, and after 8.0.15 it works;
simulation operation as follows

  • init environment:
create database sbtest;
create /* gh-ost */ table `sbtest`.`_sbtest1_del` (
        id int auto_increment primary key
) engine=InnoDB comment='ghost-cut-over-sentry';

create /* gh-ost */ table `sbtest`.`_sbtest1_ghc` (
        id int auto_increment primary key
) engine=InnoDB comment='_sbtest1_ghc';

create /* gh-ost */ table `sbtest`.`_sbtest1_gho` (
        id int auto_increment primary key
) engine=InnoDB comment='_sbtest1_ghc';

create /* gh-ost */ table `sbtest`.`sbtest1` (
        id int auto_increment primary key
) engine=InnoDB comment='_sbtest1_ghc';
  • session 1:
-- step1
select get_lock('gh-ost.11.lock', 0);
lock /* gh-ost */ tables `sbtest`.`sbtest1` write, `sbtest`.`_sbtest1_del` write;
-- step 3
drop /* gh-ost */ table if exists `sbtest`.`_sbtest1_del`;  
-- blocked in step 3
  • session 2:
-- step 2
rename /* gh-ost */ table `sbtest`.`sbtest1` to `sbtest`.`_sbtest1_del`, `sbtest`.`_sbtest1_gho` to `sbtest`.`sbtest1`;

@shlomi-noach
Copy link
Contributor

This could be related to #799, fixed by openark#14.

Otherwise, (and I don't have the capacity to test) if as @jianhaiqing suggests this doesn't work up to 8.0.15 and does word after 8.0.15, then I don't intend to pursue the issue, and will at most update the docs to require < 8.0 or > 8.0.15.

@jianhaiqing
Copy link

Yes, I agree with you. Updating docs ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants