Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible problems with gh-ost, pool-of-threads, and hot tables #482

Closed
zmoazeni opened this issue Aug 30, 2017 · 10 comments
Closed

Possible problems with gh-ost, pool-of-threads, and hot tables #482

zmoazeni opened this issue Aug 30, 2017 · 10 comments

Comments

@zmoazeni
Copy link
Contributor

I could use some help with a downtime we encountered using gh-ost today.

First we had no problem with the replica only tests and testing minor tables but we just tried it out on what we thought could be a meaty table. Sitting at 500k rows, it is much closer to one of our smaller tables than our largest. And it's not actively/heavily written to, but it very read heavily. And I think that's what got us into trouble.

gh-ost was able to clone the table and waited for me to drop the postpone flag. Once I did the writer-server's MySQL became unresponsive. Even when I logged into it and tried to connect directly my MySQL connection would hang. Now this is a very beefy machine, and the load average didn't budge. So it wasn't a matter of the hardware stalling, but definitely the way MySQL was processing.

We are using Percona MySQL 5.7 with the thread_handling=pool-of-threads turned on (a result of our upgrade from 5.6), and I think what may have happened according to the description outlined in #82 is that we must have starved the thread pool once gh-ost tried to lock this hot table for renaming.

I don't think the statements prior to gh-ost applying for the lock stalled us, but perhaps all of the queries immediately after gh-ost applied for the lock did. From the outside they began collecting up waiting for gh-ost to execute the atomic rename. That we can see happening by a pile up of our unicorn processes. gh-ost didnt seem to timeout applying for the lock ("INFO Tables locked") but it never renamed the table. It seemed to restart the sync process once more (restarted the "# Migrating ..." block) and that's when I killed it manually.

But I wonder if the prioritization of the actual rename got lost within MySQL and most queries were waiting for that to finish so gh-ost would give up the lock. That is my initial diagnosis anyway.

I need to do some digging to try and figure out if this is perhaps a configuration tuning issue. Perhaps we need to reconsider moving back from pool-of-threads to one-thread-per-connection. But I was hoping I could share that experience and see if you folks had some insight to help point me in the right direction. Maybe someone else has already run into this exact issue and could share their solution.

The threadpool size is certainly one configuration to re-evaluate but I think we may need to look into the thread pool high priority options in https://www.percona.com/doc/percona-server/LATEST/performance/threadpool.html

One direct question I wanted to ask: if we decided to try the two step rename with --cut-over=two-step does it require the same locking that is currently happening in the atomic rename? I know we could error queries between those two steps executing, but I wonder if that is another option worth exploring, or if I should be focusing all of my energy on MySQL's configuration.

@shlomi-noach
Copy link
Contributor

👋 I'm sorry to hear about your outage. A few things strike odd here, and some logging would be helpful in analyzing the case.

The first concerning issue is that the lock timeout didn't work. This is a protection mechanism designed to solve any and all issues that may arise from locking (and you certainly seem to have an issue with locking). What I would expect to see is recurring timeouts, leading to an eventual failure of the migration (an acceptable behavior).
The lock timeout is set via set session lock_wait_timeout:=... just prior to the cut-over phase.

Logging would help us analyze what happens with regard this lock.

But I wonder if the prioritization of the actual rename got lost within MySQL and most queries were waiting for that to finish so gh-ost would give up the lock. That is my initial diagnosis anyway.

You strike a delicate point here. gh-ost indeed has an assumption about MySQL's internal prioritization of queries on a table. This prioritization has been cleared and approved by Oracle/MySQL engineers and is known not to change in the known future (or we get a heads up).
I do not have a similar verification about Percona Server. I'd be surprised that this would be any different there, but I'm putting this in the open.
This is the relevant code in the MySQL codebase: https://github.com/mysql/mysql-server/blob/a533e2c786164af9bd276660b972d93649434297/sql/mdl.cc#L2312

One direct question I wanted to ask: if we decided to try the two step rename with --cut-over=two-step does it require the same locking that is currently happening in the atomic rename?

It does require similar locking. The steps are:

  1. Lock the original table
  2. Wait until we've proceessed what remaining binlog entries we have for that table
  3. Make two renames, punching a hold in the availability of the table
  4. Unlock

The similarity is that the original table is locked, with incoming DML queries waiting on it, then we RENAME it, expecting the RENAME to get higher priority over the waiting DMLs.
The difference is that those waiting DMLs would then suddenly get "no such table" errors and fail.

@zmoazeni
Copy link
Contributor Author

zmoazeni commented Aug 31, 2017

Thanks @shlomi-noach pasted at the bottom are some manually redacted logs.

The first concerning issue is that the lock timeout didn't work.

Oh no I didn't mean that at all. I meant gh-ost didn't timeout retrieving the lock. The relevant snippet from the logs is:

2017-08-30 14:28:06 INFO Grabbing voluntary lock: gh-ost.26349918.lock
2017-08-30 14:28:06 INFO Setting LOCK timeout as 6 seconds
2017-08-30 14:28:06 INFO Looking for magic cut-over table
2017-08-30 14:28:06 INFO Creating magic cut-over table `mydb`.`_mytable_del`
2017-08-30 14:28:06 INFO Magic cut-over table created
2017-08-30 14:28:06 INFO Locking `mydb`.`mytable`, `mydb`.`_mytable_del`
2017-08-30 14:28:06 INFO Tables locked
2017-08-30 14:28:06 INFO Session locking original & magic tables is 26349918
2017-08-30 14:28:06 INFO Writing changelog state: AllEventsUpToLockProcessed:1504103286802312398

With 2017-08-30 14:28:06 INFO Tables locked and looking at the source code:

gh-ost/go/logic/applier.go

Lines 813 to 817 in b336409

if _, err := tx.Exec(query); err != nil {
tableLocked <- err
return err
}
log.Infof("Tables locked")

That tells me that we not only executed the locking code, but it returned! Meaning that there shouldn't even be a long running transaction prior to the lock query that gh-ost is waiting on for the table we're going to rename.

Please correct me on this one if I'm off. But if we had a long running transaction using this table, our gh-ost query that issued the lock /* gh-ost */ tables ... would block until those transactions were complete (assuming REPEATABLE READ).

Therefore since the query completed and returned without error, we know that nothing prior is affecting us (C1..C9 in #82)

So what is curious is that the followup atomic renaming of the tables just seemed to get lost in the ether. And that should be extremely quick after we have a hold of the locks.

What is curious is that this statement gets printed:

gh-ost/go/logic/migrator.go

Lines 501 to 502 in b336409

log.Infof("Writing changelog state: %+v", allEventsUpToLockProcessedChallenge)
if _, err := this.applier.WriteChangelogState(allEventsUpToLockProcessedChallenge); err != nil {

and from reading the code, it sounds like gh-ost waits until that change log state makes it to the binary log. And that happens prior to the rename.

So if my threadpool starved that gh-ost connection from actually executing the change log update, I imagine that is what led to C11..C19+ connections waiting until the original table lock timed out or the connection ended when I issued Ctrl-C. (Causing the actual outage we experienced).

If I'm off on any of that reasoning, please feel free to correct me!


With that in mind, I'm heavily considering enabling and using the extra port that percona offers https://www.percona.com/doc/percona-server/LATEST/performance/threadpool.html#extra_port and having gh-ost connect and migrate from that.

I've confirmed with them that the connections on the extra port use the one-thread-per-connection. That way nothing gh-ost related has a chance of getting starved.


gh-ost run output
 PRODUCTION  myserver.host:~# touch /tmp/ghost.postpone.flag && /usr/local/bin/gh-ost \
> --initially-drop-ghost-table \
> --initially-drop-old-table \
> --assume-rbr \
> --assume-master-host="10.0.0.1" \
> --max-load=Threads_running=25 \
> --critical-load=Threads_running=1000 \
> --chunk-size=1000 \
> --max-lag-millis=1500 \
> --user="ghost" \
> --ask-pass \
> --database="mydb" \
> --table="mytable" \
> --verbose \
> --alter="ADD mynewcol tinyint(1) DEFAULT 0 NOT NULL" \
> --exact-rowcount \
> --concurrent-rowcount \
> --default-retries=120 \
> --panic-flag-file=/tmp/ghost.panic.flag \
> --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
> --execute
Password:
2017-08-30 14:20:33 INFO starting gh-ost 1.0.40
2017-08-30 14:20:33 INFO Migrating `mydb`.`mytable`
2017-08-30 14:20:33 INFO connection validated on 127.0.0.1:3306
2017-08-30 14:20:33 INFO User has ALL privileges
2017-08-30 14:20:33 INFO binary logs validated on 127.0.0.1:3306
2017-08-30 14:20:33 INFO Inspector initiated on myserver.host:3306, version 5.7.x-y-log
2017-08-30 14:20:33 INFO Table found. Engine=InnoDB
2017-08-30 14:20:33 INFO Estimated number of rows via EXPLAIN: 573318
2017-08-30 14:20:33 INFO Master forced to be 10.0.0.1:3306
2017-08-30 14:20:33 INFO log_slave_updates validated on 127.0.0.1:3306
2017-08-30 14:20:33 INFO connection validated on 127.0.0.1:3306
2017/08/30 14:20:33 binlogsyncer.go:75: [info] create BinlogSyncer with config &{99999 mysql 127.0.0.1 3306 ghost VJstopd5K8jjrm3  false false <nil>}
2017-08-30 14:20:33 INFO Connecting binlog streamer at master-bin.012307:61119447
2017/08/30 14:20:33 binlogsyncer.go:241: [info] begin to sync binlog from position (master-bin.012307, 61119447)
2017/08/30 14:20:33 binlogsyncer.go:134: [info] register slave for master server 127.0.0.1:3306
2017/08/30 14:20:33 binlogsyncer.go:568: [info] rotate to (master-bin.012307, 61119447)
2017-08-30 14:20:33 INFO rotate to next log name: master-bin.012307
2017-08-30 14:20:33 INFO connection validated on 10.0.0.1:3306
2017-08-30 14:20:33 INFO connection validated on 10.0.0.1:3306
2017-08-30 14:20:33 INFO will use time_zone='SYSTEM' on applier
2017-08-30 14:20:33 INFO Examining table structure on applier
2017-08-30 14:20:33 INFO Applier initiated on mymaster.host:3306, version 5.7.x-y-log
2017-08-30 14:20:33 INFO Droppping table `mydb`.`_mytable_gho`
2017-08-30 14:20:33 INFO Table dropped
2017-08-30 14:20:33 INFO Droppping table `mydb`.`_mytable_del`
2017-08-30 14:20:33 INFO Table dropped
2017-08-30 14:20:33 INFO Droppping table `mydb`.`_mytable_ghc`
2017-08-30 14:20:33 INFO Table dropped
2017-08-30 14:20:33 INFO Creating changelog table `mydb`.`_mytable_ghc`
2017-08-30 14:20:33 INFO Changelog table created
2017-08-30 14:20:33 INFO Creating ghost table `mydb`.`_mytable_gho`
2017-08-30 14:20:33 INFO Ghost table created
2017-08-30 14:20:33 INFO Altering ghost table `mydb`.`_mytable_gho`
2017-08-30 14:20:34 INFO Ghost table altered
2017-08-30 14:20:34 INFO Waiting for ghost table to be migrated. Current lag is 1s
2017-08-30 14:20:34 INFO Intercepted changelog state GhostTableMigrated
2017-08-30 14:20:34 INFO Handled changelog state GhostTableMigrated
2017-08-30 14:20:34 INFO Chosen shared unique key is PRIMARY
2017-08-30 14:20:34 INFO Shared columns are id,name,created_at,subdomain,is_active,plan_type,email_after_approve,qb_createdtime,qb_name,signup_referrer,qb_pitem_billable,qb_pitem_non_billable,timezone,cache_version,currency,time_format,color_scheme_id,qb_version,approval_required,trial_type,landing_page,subdomain_hash,partner_id,show_project_notes_to_project_managers,account_owner_id,receipt_recipients,on_hold_till,on_hold_follow_up_state,ledes_feature,ledes_law_firm_id,estimate_feature,expense_feature,invoice_feature,wants_timestamp_timers,needs_unique_project_code,fiscal_year_start,trial_till,converted_at,updated_at,first_timer,internal,week_start_day,sponsored_till,pricing_version,client_view_feature,identity_account_id,final_signup_referrer,final_landing_page,weekly_capacity,team_feature,dashboard_enabled,compare_project_budget
2017-08-30 14:20:34 INFO Listening on unix socket file: /tmp/gh-ost.mydb.mytable.sock
2017-08-30 14:20:34 INFO As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on
2017-08-30 14:20:34 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2017-08-30 14:20:34 INFO Migration min values: [1]
2017-08-30 14:20:34 INFO Migration max values: [681807]
2017-08-30 14:20:34 INFO Waiting for first throttle metrics to be collected
2017-08-30 14:20:34 INFO Exact number of rows via COUNT: 590352
2017-08-30 14:20:34 INFO First throttle metrics collected
# Migrating `mydb`.`mytable`; Ghost table is `mydb`.`_mytable_gho`
# Migrating mymaster.host:3306; inspecting myserver.host:3306; executing on myserver.host
# Migration started at Wed Aug 30 14:20:33 +0000 2017
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag [set]
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.mydb.mytable.sock
Copy: 0/590352 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: master-bin.012307:61229828; State: migrating; ETA: N/A
Copy: 3000/590352 0.5%; Applied: 3; Backlog: 1/1000; Time: 1s(total), 1s(copy); streamer: master-bin.012307:61885729; State: migrating; ETA: N/A
Copy: 5000/590352 0.8%; Applied: 11; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: master-bin.012307:62357382; State: migrating; ETA: N/A
Copy: 7000/590352 1.2%; Applied: 16; Backlog: 0/1000; Time: 3s(total), 3s(copy); streamer: master-bin.012307:62894876; State: migrating; ETA: 4m13s
Copy: 9000/590352 1.5%; Applied: 24; Backlog: 1/1000; Time: 4s(total), 4s(copy); streamer: master-bin.012307:63477141; State: migrating; ETA: 4m21s
Copy: 10000/590352 1.7%; Applied: 32; Backlog: 0/1000; Time: 5s(total), 5s(copy); streamer: master-bin.012307:63806068; State: migrating; ETA: 4m52s
Copy: 12000/590352 2.0%; Applied: 39; Backlog: 1/1000; Time: 6s(total), 6s(copy); streamer: master-bin.012307:64383150; State: migrating; ETA: 4m51s
Copy: 15000/590352 2.5%; Applied: 44; Backlog: 0/1000; Time: 7s(total), 7s(copy); streamer: master-bin.012307:65196054; State: migrating; ETA: 4m30s
Copy: 18000/590352 3.0%; Applied: 46; Backlog: 0/1000; Time: 8s(total), 8s(copy); streamer: master-bin.012307:66104976; State: migrating; ETA: 4m15s
Copy: 20000/590352 3.4%; Applied: 55; Backlog: 0/1000; Time: 9s(total), 9s(copy); streamer: master-bin.012307:66774230; State: migrating; ETA: 4m17s
Copy: 21000/590352 3.6%; Applied: 63; Backlog: 2/1000; Time: 10s(total), 10s(copy); streamer: master-bin.012307:67176116; State: migrating; ETA: 4m32s
Copy: 24000/590352 4.1%; Applied: 69; Backlog: 0/1000; Time: 11s(total), 11s(copy); streamer: master-bin.012307:68155507; State: migrating; ETA: 4m20s
Copy: 26000/590352 4.4%; Applied: 75; Backlog: 0/1000; Time: 12s(total), 12s(copy); streamer: master-bin.012307:68834472; State: migrating; ETA: 4m21s
Copy: 28000/590352 4.7%; Applied: 81; Backlog: 0/1000; Time: 13s(total), 13s(copy); streamer: master-bin.012307:69557722; State: migrating; ETA: 4m22s
Copy: 31000/590352 5.3%; Applied: 85; Backlog: 0/1000; Time: 14s(total), 14s(copy); streamer: master-bin.012307:70231820; State: migrating; ETA: 4m13s
Copy: 33000/590352 5.6%; Applied: 90; Backlog: 0/1000; Time: 15s(total), 15s(copy); streamer: master-bin.012307:71235586; State: migrating; ETA: 4m14s
Copy: 35000/590352 5.9%; Applied: 97; Backlog: 0/1000; Time: 16s(total), 16s(copy); streamer: master-bin.012307:71935128; State: migrating; ETA: 4m14s
Copy: 39000/590352 6.6%; Applied: 100; Backlog: 1/1000; Time: 17s(total), 17s(copy); streamer: master-bin.012307:73230608; State: migrating; ETA: 4m0s
Copy: 40000/590352 6.8%; Applied: 109; Backlog: 0/1000; Time: 18s(total), 18s(copy); streamer: master-bin.012307:73614971; State: migrating; ETA: 4m8s
Copy: 43000/590352 7.3%; Applied: 113; Backlog: 0/1000; Time: 19s(total), 19s(copy); streamer: master-bin.012307:74544930; State: migrating; ETA: 4m2s
Copy: 46000/590352 7.8%; Applied: 116; Backlog: 0/1000; Time: 20s(total), 20s(copy); streamer: master-bin.012307:75466313; State: migrating; ETA: 3m57s
Copy: 48000/590352 8.1%; Applied: 123; Backlog: 0/1000; Time: 21s(total), 21s(copy); streamer: master-bin.012307:76197124; State: migrating; ETA: 3m57s
Copy: 51000/590352 8.6%; Applied: 127; Backlog: 0/1000; Time: 22s(total), 22s(copy); streamer: master-bin.012307:77235454; State: migrating; ETA: 3m53s
Copy: 54000/590352 9.1%; Applied: 132; Backlog: 0/1000; Time: 23s(total), 23s(copy); streamer: master-bin.012307:78553515; State: migrating; ETA: 3m48s
Copy: 57000/590352 9.7%; Applied: 135; Backlog: 0/1000; Time: 24s(total), 24s(copy); streamer: master-bin.012307:79873836; State: migrating; ETA: 3m44s
Copy: 60000/590352 10.2%; Applied: 140; Backlog: 0/1000; Time: 25s(total), 25s(copy); streamer: master-bin.012307:81144560; State: migrating; ETA: 3m41s
Copy: 62000/590352 10.5%; Applied: 146; Backlog: 0/1000; Time: 26s(total), 26s(copy); streamer: master-bin.012307:81950132; State: migrating; ETA: 3m41s
Copy: 65000/590352 11.0%; Applied: 149; Backlog: 0/1000; Time: 27s(total), 27s(copy); streamer: master-bin.012307:83121069; State: migrating; ETA: 3m38s
Copy: 68000/590352 11.5%; Applied: 157; Backlog: 0/1000; Time: 28s(total), 28s(copy); streamer: master-bin.012307:84299289; State: migrating; ETA: 3m35s
Copy: 70000/590352 11.9%; Applied: 163; Backlog: 0/1000; Time: 29s(total), 29s(copy); streamer: master-bin.012307:85140676; State: migrating; ETA: 3m35s
Copy: 72000/590352 12.2%; Applied: 167; Backlog: 1/1000; Time: 30s(total), 30s(copy); streamer: master-bin.012307:85968906; State: migrating; ETA: 3m36s
Copy: 73000/590352 12.4%; Applied: 178; Backlog: 0/1000; Time: 31s(total), 31s(copy); streamer: master-bin.012307:86437198; State: migrating; ETA: 3m40s
Copy: 75000/590352 12.7%; Applied: 187; Backlog: 0/1000; Time: 32s(total), 32s(copy); streamer: master-bin.012307:87192115; State: migrating; ETA: 3m40s
Copy: 76000/590352 12.9%; Applied: 197; Backlog: 0/1000; Time: 33s(total), 33s(copy); streamer: master-bin.012307:87598842; State: migrating; ETA: 3m43s
Copy: 79000/590352 13.4%; Applied: 201; Backlog: 1/1000; Time: 34s(total), 34s(copy); streamer: master-bin.012307:88637565; State: migrating; ETA: 3m40s
Copy: 81000/590352 13.7%; Applied: 209; Backlog: 0/1000; Time: 35s(total), 35s(copy); streamer: master-bin.012307:89330661; State: migrating; ETA: 3m40s
Copy: 82000/590352 13.9%; Applied: 217; Backlog: 2/1000; Time: 36s(total), 36s(copy); streamer: master-bin.012307:89781411; State: migrating; ETA: 3m43s
Copy: 84000/590352 14.2%; Applied: 227; Backlog: 0/1000; Time: 37s(total), 37s(copy); streamer: master-bin.012307:90556009; State: migrating; ETA: 3m43s
Copy: 86000/590352 14.6%; Applied: 233; Backlog: 1/1000; Time: 38s(total), 38s(copy); streamer: master-bin.012307:91317376; State: migrating; ETA: 3m43s
Copy: 87000/590352 14.7%; Applied: 242; Backlog: 1/1000; Time: 39s(total), 39s(copy); streamer: master-bin.012307:91773576; State: migrating; ETA: 3m45s
Copy: 88000/590352 14.9%; Applied: 254; Backlog: 0/1000; Time: 40s(total), 40s(copy); streamer: master-bin.012307:92213443; State: migrating; ETA: 3m48s
Copy: 91000/590352 15.4%; Applied: 259; Backlog: 1/1000; Time: 41s(total), 41s(copy); streamer: master-bin.012307:93400212; State: migrating; ETA: 3m45s
Copy: 92000/590352 15.6%; Applied: 271; Backlog: 0/1000; Time: 42s(total), 42s(copy); streamer: master-bin.012307:93842086; State: migrating; ETA: 3m47s
Copy: 94000/590352 15.9%; Applied: 277; Backlog: 2/1000; Time: 43s(total), 43s(copy); streamer: master-bin.012307:94673705; State: migrating; ETA: 3m47s
Copy: 97000/590352 16.4%; Applied: 282; Backlog: 2/1000; Time: 44s(total), 44s(copy); streamer: master-bin.012307:95828248; State: migrating; ETA: 3m44s
Copy: 98000/590352 16.6%; Applied: 292; Backlog: 3/1000; Time: 45s(total), 45s(copy); streamer: master-bin.012307:96286660; State: migrating; ETA: 3m46s
Copy: 101000/590352 17.1%; Applied: 297; Backlog: 1/1000; Time: 46s(total), 46s(copy); streamer: master-bin.012307:97423075; State: migrating; ETA: 3m43s
Copy: 102000/590352 17.3%; Applied: 309; Backlog: 0/1000; Time: 47s(total), 47s(copy); streamer: master-bin.012307:97894283; State: migrating; ETA: 3m45s
Copy: 104000/590352 17.6%; Applied: 315; Backlog: 1/1000; Time: 48s(total), 48s(copy); streamer: master-bin.012307:98715207; State: migrating; ETA: 3m44s
Copy: 106000/590352 18.0%; Applied: 322; Backlog: 0/1000; Time: 49s(total), 49s(copy); streamer: master-bin.012307:99586941; State: migrating; ETA: 3m44s
Copy: 109000/590352 18.5%; Applied: 324; Backlog: 1/1000; Time: 50s(total), 50s(copy); streamer: master-bin.012307:100844778; State: migrating; ETA: 3m41s
Copy: 110000/590352 18.6%; Applied: 336; Backlog: 0/1000; Time: 51s(total), 51s(copy); streamer: master-bin.012307:101336636; State: migrating; ETA: 3m42s
Copy: 112000/590352 19.0%; Applied: 341; Backlog: 0/1000; Time: 52s(total), 52s(copy); streamer: master-bin.012307:102201487; State: migrating; ETA: 3m42s
Copy: 115000/590352 19.5%; Applied: 346; Backlog: 2/1000; Time: 53s(total), 53s(copy); streamer: master-bin.012307:103524488; State: migrating; ETA: 3m39s
Copy: 116000/590352 19.6%; Applied: 356; Backlog: 0/1000; Time: 54s(total), 54s(copy); streamer: master-bin.012307:104010560; State: migrating; ETA: 3m41s
2017/08/30 14:21:29 binlogsyncer.go:568: [info] rotate to (master-bin.012308, 4)
2017/08/30 14:21:29 binlogsyncer.go:568: [info] rotate to (master-bin.012308, 4)
2017-08-30 14:21:29 INFO rotate to next log name: master-bin.012308
2017-08-30 14:21:29 INFO rotate to next log name: master-bin.012308
Copy: 118000/590352 20.0%; Applied: 363; Backlog: 0/1000; Time: 55s(total), 55s(copy); streamer: master-bin.012307:104924190; State: migrating; ETA: 3m40s
Copy: 121000/590352 20.5%; Applied: 369; Backlog: 0/1000; Time: 56s(total), 56s(copy); streamer: master-bin.012308:1383160; State: migrating; ETA: 3m37s
Copy: 122000/590352 20.7%; Applied: 378; Backlog: 0/1000; Time: 57s(total), 57s(copy); streamer: master-bin.012308:1911564; State: migrating; ETA: 3m38s
Copy: 125000/590352 21.2%; Applied: 382; Backlog: 1/1000; Time: 58s(total), 58s(copy); streamer: master-bin.012308:3309156; State: migrating; ETA: 3m36s
Copy: 127000/590353 21.5%; Applied: 392; Backlog: 0/1000; Time: 59s(total), 59s(copy); streamer: master-bin.012308:4269908; State: migrating; ETA: 3m35s
Copy: 129000/590353 21.9%; Applied: 400; Backlog: 0/1000; Time: 1m0s(total), 1m0s(copy); streamer: master-bin.012308:5239693; State: migrating; ETA: 3m34s
Copy: 141000/590353 23.9%; Applied: 431; Backlog: 0/1000; Time: 1m5s(total), 1m5s(copy); streamer: master-bin.012308:10744878; State: migrating; ETA: 3m27s
Copy: 151000/590353 25.6%; Applied: 468; Backlog: 0/1000; Time: 1m10s(total), 1m10s(copy); streamer: master-bin.012308:15114907; State: migrating; ETA: 3m23s
Copy: 161000/590353 27.3%; Applied: 503; Backlog: 0/1000; Time: 1m15s(total), 1m15s(copy); streamer: master-bin.012308:20076342; State: migrating; ETA: 3m20s
Copy: 172000/590353 29.1%; Applied: 536; Backlog: 0/1000; Time: 1m20s(total), 1m20s(copy); streamer: master-bin.012308:24698375; State: migrating; ETA: 3m14s
Copy: 182000/590353 30.8%; Applied: 560; Backlog: 10/1000; Time: 1m25s(total), 1m25s(copy); streamer: master-bin.012308:28686912; State: migrating; ETA: 3m10s
Copy: 189000/590353 32.0%; Applied: 614; Backlog: 1/1000; Time: 1m30s(total), 1m30s(copy); streamer: master-bin.012308:32019838; State: migrating; ETA: 3m11s
Copy: 201000/590353 34.0%; Applied: 643; Backlog: 1/1000; Time: 1m35s(total), 1m35s(copy); streamer: master-bin.012308:36868197; State: migrating; ETA: 3m4s
Copy: 212000/590353 35.9%; Applied: 673; Backlog: 0/1000; Time: 1m40s(total), 1m40s(copy); streamer: master-bin.012308:41248994; State: migrating; ETA: 2m58s
Copy: 226000/590353 38.3%; Applied: 698; Backlog: 0/1000; Time: 1m45s(total), 1m45s(copy); streamer: master-bin.012308:46971461; State: migrating; ETA: 2m49s
Copy: 238000/590353 40.3%; Applied: 727; Backlog: 0/1000; Time: 1m50s(total), 1m50s(copy); streamer: master-bin.012308:52045110; State: migrating; ETA: 2m42s
Copy: 248000/590354 42.0%; Applied: 762; Backlog: 2/1000; Time: 1m55s(total), 1m55s(copy); streamer: master-bin.012308:56340857; State: migrating; ETA: 2m38s
Copy: 260000/590354 44.0%; Applied: 794; Backlog: 1/1000; Time: 2m0s(total), 2m0s(copy); streamer: master-bin.012308:61362469; State: migrating; ETA: 2m32s
Copy: 271000/590354 45.9%; Applied: 827; Backlog: 4/1000; Time: 2m5s(total), 2m5s(copy); streamer: master-bin.012308:66124291; State: migrating; ETA: 2m27s
Copy: 286000/590354 48.4%; Applied: 852; Backlog: 0/1000; Time: 2m10s(total), 2m10s(copy); streamer: master-bin.012308:71380211; State: migrating; ETA: 2m18s
Copy: 299000/590354 50.6%; Applied: 879; Backlog: 0/1000; Time: 2m15s(total), 2m15s(copy); streamer: master-bin.012308:76737209; State: migrating; ETA: 2m11s
Copy: 314000/590354 53.2%; Applied: 901; Backlog: 0/1000; Time: 2m20s(total), 2m20s(copy); streamer: master-bin.012308:82536154; State: migrating; ETA: 2m3s
Copy: 328000/590354 55.6%; Applied: 924; Backlog: 0/1000; Time: 2m25s(total), 2m25s(copy); streamer: master-bin.012308:87957293; State: migrating; ETA: 1m56s
Copy: 341000/590354 57.8%; Applied: 947; Backlog: 0/1000; Time: 2m30s(total), 2m30s(copy); streamer: master-bin.012308:93594128; State: migrating; ETA: 1m49s
Copy: 352000/590354 59.6%; Applied: 976; Backlog: 0/1000; Time: 2m35s(total), 2m35s(copy); streamer: master-bin.012308:98545272; State: migrating; ETA: 1m44s
Copy: 364000/590354 61.7%; Applied: 1004; Backlog: 1/1000; Time: 2m40s(total), 2m40s(copy); streamer: master-bin.012308:103605400; State: migrating; ETA: 1m39s
2017/08/30 14:23:15 binlogsyncer.go:568: [info] rotate to (master-bin.012309, 4)
2017/08/30 14:23:15 binlogsyncer.go:568: [info] rotate to (master-bin.012309, 4)
2017-08-30 14:23:15 INFO rotate to next log name: master-bin.012309
2017-08-30 14:23:15 INFO rotate to next log name: master-bin.012309
Copy: 375000/590354 63.5%; Applied: 1034; Backlog: 2/1000; Time: 2m45s(total), 2m45s(copy); streamer: master-bin.012309:2997069; State: migrating; ETA: 1m34s
Copy: 386000/590354 65.4%; Applied: 1071; Backlog: 1/1000; Time: 2m50s(total), 2m50s(copy); streamer: master-bin.012309:7811450; State: migrating; ETA: 1m30s
Copy: 400000/590354 67.8%; Applied: 1096; Backlog: 0/1000; Time: 2m55s(total), 2m55s(copy); streamer: master-bin.012309:12998685; State: migrating; ETA: 1m23s
Copy: 414000/590354 70.1%; Applied: 1120; Backlog: 0/1000; Time: 3m0s(total), 3m0s(copy); streamer: master-bin.012309:17674097; State: migrating; ETA: 1m16s
Copy: 428000/590354 72.5%; Applied: 1139; Backlog: 0/1000; Time: 3m5s(total), 3m5s(copy); streamer: master-bin.012309:23617198; State: migrating; ETA: 1m10s
Copy: 440000/590354 74.5%; Applied: 1170; Backlog: 0/1000; Time: 3m10s(total), 3m10s(copy); streamer: master-bin.012309:29164049; State: migrating; ETA: 1m4s
Copy: 452000/590354 76.6%; Applied: 1188; Backlog: 0/1000; Time: 3m14s(total), 3m14s(copy); streamer: master-bin.012309:34241481; State: migrating; ETA: 59s
Copy: 454000/590354 76.9%; Applied: 1193; Backlog: 0/1000; Time: 3m15s(total), 3m15s(copy); streamer: master-bin.012309:35207513; State: migrating; ETA: 58s
Copy: 456000/590354 77.2%; Applied: 1199; Backlog: 0/1000; Time: 3m16s(total), 3m16s(copy); streamer: master-bin.012309:36561552; State: migrating; ETA: 57s
Copy: 459000/590354 77.7%; Applied: 1203; Backlog: 0/1000; Time: 3m17s(total), 3m17s(copy); streamer: master-bin.012309:37918182; State: migrating; ETA: 56s
Copy: 463000/590354 78.4%; Applied: 1206; Backlog: 0/1000; Time: 3m18s(total), 3m18s(copy); streamer: master-bin.012309:39252035; State: migrating; ETA: 54s
Copy: 464000/590354 78.6%; Applied: 1214; Backlog: 0/1000; Time: 3m19s(total), 3m19s(copy); streamer: master-bin.012309:40189739; State: migrating; ETA: 54s
Copy: 467000/590354 79.1%; Applied: 1218; Backlog: 0/1000; Time: 3m20s(total), 3m20s(copy); streamer: master-bin.012309:41454348; State: migrating; ETA: 52s
Copy: 469000/590354 79.4%; Applied: 1225; Backlog: 0/1000; Time: 3m21s(total), 3m21s(copy); streamer: master-bin.012309:42375806; State: migrating; ETA: 52s
Copy: 471000/590354 79.8%; Applied: 1230; Backlog: 0/1000; Time: 3m22s(total), 3m22s(copy); streamer: master-bin.012309:43280336; State: migrating; ETA: 51s
Copy: 473000/590354 80.1%; Applied: 1236; Backlog: 0/1000; Time: 3m23s(total), 3m23s(copy); streamer: master-bin.012309:44194164; State: migrating; ETA: 50s
Copy: 474000/590354 80.3%; Applied: 1245; Backlog: 0/1000; Time: 3m24s(total), 3m24s(copy); streamer: master-bin.012309:44715204; State: migrating; ETA: 50s
Copy: 477000/590354 80.8%; Applied: 1249; Backlog: 0/1000; Time: 3m25s(total), 3m25s(copy); streamer: master-bin.012309:46022228; State: migrating; ETA: 48s
Copy: 481000/590354 81.5%; Applied: 1253; Backlog: 0/1000; Time: 3m26s(total), 3m26s(copy); streamer: master-bin.012309:47407367; State: migrating; ETA: 46s
Copy: 483000/590354 81.8%; Applied: 1259; Backlog: 0/1000; Time: 3m27s(total), 3m27s(copy); streamer: master-bin.012309:48750847; State: migrating; ETA: 46s
Copy: 486000/590354 82.3%; Applied: 1262; Backlog: 0/1000; Time: 3m28s(total), 3m28s(copy); streamer: master-bin.012309:50043020; State: migrating; ETA: 44s
Copy: 488000/590354 82.7%; Applied: 1269; Backlog: 0/1000; Time: 3m29s(total), 3m29s(copy); streamer: master-bin.012309:51013876; State: migrating; ETA: 43s
Copy: 491000/590354 83.2%; Applied: 1273; Backlog: 0/1000; Time: 3m30s(total), 3m30s(copy); streamer: master-bin.012309:52297593; State: migrating; ETA: 42s
Copy: 494000/590354 83.7%; Applied: 1277; Backlog: 2/1000; Time: 3m31s(total), 3m31s(copy); streamer: master-bin.012309:53587139; State: migrating; ETA: 41s
Copy: 497000/590354 84.2%; Applied: 1282; Backlog: 0/1000; Time: 3m32s(total), 3m32s(copy); streamer: master-bin.012309:54905182; State: migrating; ETA: 39s
Copy: 499000/590354 84.5%; Applied: 1289; Backlog: 0/1000; Time: 3m33s(total), 3m33s(copy); streamer: master-bin.012309:55844632; State: migrating; ETA: 39s
Copy: 501000/590354 84.9%; Applied: 1296; Backlog: 1/1000; Time: 3m34s(total), 3m34s(copy); streamer: master-bin.012309:56761790; State: migrating; ETA: 38s
Copy: 504000/590354 85.4%; Applied: 1302; Backlog: 0/1000; Time: 3m35s(total), 3m35s(copy); streamer: master-bin.012309:58084611; State: migrating; ETA: 36s
Copy: 506000/590354 85.7%; Applied: 1308; Backlog: 0/1000; Time: 3m36s(total), 3m36s(copy); streamer: master-bin.012309:58945472; State: migrating; ETA: 36s
Copy: 509000/590354 86.2%; Applied: 1310; Backlog: 2/1000; Time: 3m37s(total), 3m37s(copy); streamer: master-bin.012309:60205259; State: migrating; ETA: 34s
Copy: 512000/590354 86.7%; Applied: 1315; Backlog: 0/1000; Time: 3m38s(total), 3m38s(copy); streamer: master-bin.012309:61118099; State: migrating; ETA: 33s
Copy: 514000/590354 87.1%; Applied: 1320; Backlog: 1/1000; Time: 3m39s(total), 3m39s(copy); streamer: master-bin.012309:62366427; State: migrating; ETA: 32s
Copy: 517000/590354 87.6%; Applied: 1325; Backlog: 0/1000; Time: 3m40s(total), 3m40s(copy); streamer: master-bin.012309:63707091; State: migrating; ETA: 31s
Copy: 521000/590354 88.3%; Applied: 1326; Backlog: 0/1000; Time: 3m41s(total), 3m41s(copy); streamer: master-bin.012309:65348782; State: migrating; ETA: 29s
Copy: 524000/590354 88.8%; Applied: 1332; Backlog: 0/1000; Time: 3m42s(total), 3m42s(copy); streamer: master-bin.012309:66712206; State: migrating; ETA: 28s
Copy: 528000/590354 89.4%; Applied: 1333; Backlog: 0/1000; Time: 3m43s(total), 3m43s(copy); streamer: master-bin.012309:68466657; State: migrating; ETA: 26s
Copy: 532000/590354 90.1%; Applied: 1334; Backlog: 0/1000; Time: 3m44s(total), 3m44s(copy); streamer: master-bin.012309:70226309; State: migrating; ETA: 24s
Copy: 533000/590354 90.3%; Applied: 1342; Backlog: 1/1000; Time: 3m45s(total), 3m45s(copy); streamer: master-bin.012309:70730198; State: migrating; ETA: 24s
Copy: 536000/590354 90.8%; Applied: 1346; Backlog: 1/1000; Time: 3m46s(total), 3m46s(copy); streamer: master-bin.012309:71945265; State: migrating; ETA: 22s
Copy: 539000/590354 91.3%; Applied: 1350; Backlog: 0/1000; Time: 3m47s(total), 3m47s(copy); streamer: master-bin.012309:73296177; State: migrating; ETA: 21s
Copy: 541000/590354 91.6%; Applied: 1357; Backlog: 0/1000; Time: 3m48s(total), 3m48s(copy); streamer: master-bin.012309:74252636; State: migrating; ETA: 20s
Copy: 543000/590354 92.0%; Applied: 1363; Backlog: 0/1000; Time: 3m49s(total), 3m49s(copy); streamer: master-bin.012309:75204002; State: migrating; ETA: 19s
Copy: 545000/590354 92.3%; Applied: 1370; Backlog: 0/1000; Time: 3m50s(total), 3m50s(copy); streamer: master-bin.012309:76162030; State: migrating; ETA: 19s
Copy: 547000/590354 92.7%; Applied: 1378; Backlog: 1/1000; Time: 3m51s(total), 3m51s(copy); streamer: master-bin.012309:77132838; State: migrating; ETA: 18s
Copy: 549000/590354 93.0%; Applied: 1383; Backlog: 0/1000; Time: 3m52s(total), 3m52s(copy); streamer: master-bin.012309:78061515; State: migrating; ETA: 17s
Copy: 551000/590354 93.3%; Applied: 1390; Backlog: 0/1000; Time: 3m53s(total), 3m53s(copy); streamer: master-bin.012309:79004796; State: migrating; ETA: 16s
Copy: 555000/590354 94.0%; Applied: 1391; Backlog: 1/1000; Time: 3m54s(total), 3m54s(copy); streamer: master-bin.012309:80768832; State: migrating; ETA: 14s
Copy: 558000/590354 94.5%; Applied: 1396; Backlog: 1/1000; Time: 3m55s(total), 3m55s(copy); streamer: master-bin.012309:82147856; State: migrating; ETA: 13s
Copy: 561000/590354 95.0%; Applied: 1402; Backlog: 0/1000; Time: 3m56s(total), 3m56s(copy); streamer: master-bin.012309:83477082; State: migrating; ETA: 12s
Copy: 562000/590354 95.2%; Applied: 1412; Backlog: 0/1000; Time: 3m57s(total), 3m57s(copy); streamer: master-bin.012309:84026858; State: migrating; ETA: 11s
Copy: 566000/590354 95.9%; Applied: 1414; Backlog: 0/1000; Time: 3m58s(total), 3m58s(copy); streamer: master-bin.012309:85691733; State: migrating; ETA: 10s
Copy: 568000/590354 96.2%; Applied: 1418; Backlog: 0/1000; Time: 3m59s(total), 3m59s(copy); streamer: master-bin.012309:86761163; State: migrating; ETA: 9s
Copy: 571000/590354 96.7%; Applied: 1423; Backlog: 1/1000; Time: 4m0s(total), 4m0s(copy); streamer: master-bin.012309:88173083; State: migrating; ETA: 8s
Copy: 574000/590354 97.2%; Applied: 1427; Backlog: 0/1000; Time: 4m1s(total), 4m1s(copy); streamer: master-bin.012309:89602030; State: migrating; ETA: 6s
Copy: 577000/590354 97.7%; Applied: 1430; Backlog: 1/1000; Time: 4m2s(total), 4m2s(copy); streamer: master-bin.012309:91066893; State: migrating; ETA: 5s
Copy: 580000/590354 98.2%; Applied: 1435; Backlog: 0/1000; Time: 4m3s(total), 4m3s(copy); streamer: master-bin.012309:92524574; State: migrating; ETA: 4s
Copy: 582000/590354 98.6%; Applied: 1443; Backlog: 0/1000; Time: 4m4s(total), 4m4s(copy); streamer: master-bin.012309:93520960; State: migrating; ETA: 3s
Copy: 585000/590354 99.1%; Applied: 1446; Backlog: 0/1000; Time: 4m5s(total), 4m5s(copy); streamer: master-bin.012309:94514032; State: migrating; ETA: 2s
Copy: 587000/590355 99.4%; Applied: 1454; Backlog: 0/1000; Time: 4m6s(total), 4m6s(copy); streamer: master-bin.012309:95923654; State: migrating; ETA: 1s
Copy: 588000/590355 99.6%; Applied: 1462; Backlog: 4/1000; Time: 4m7s(total), 4m7s(copy); streamer: master-bin.012309:96490065; State: migrating; ETA: 0s
Copy: 590352/590355 100.0%; Applied: 1469; Backlog: 1/1000; Time: 4m8s(total), 4m8s(copy); streamer: master-bin.012309:97578355; State: migrating; ETA: 0s
2017-08-30 14:24:42 INFO Row copy complete
Copy: 590352/590352 100.0%; Applied: 1474; Backlog: 0/1000; Time: 4m9s(total), 4m8s(copy); streamer: master-bin.012309:97626110; State: migrating; ETA: due
Copy: 590352/590352 100.0%; Applied: 1480; Backlog: 1/1000; Time: 4m10s(total), 4m8s(copy); streamer: master-bin.012309:97776410; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1496; Backlog: 3/1000; Time: 4m15s(total), 4m8s(copy); streamer: master-bin.012309:98390422; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1521; Backlog: 4/1000; Time: 4m20s(total), 4m8s(copy); streamer: master-bin.012309:98863704; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1543; Backlog: 0/1000; Time: 4m25s(total), 4m8s(copy); streamer: master-bin.012309:99367686; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1575; Backlog: 0/1000; Time: 4m30s(total), 4m8s(copy); streamer: master-bin.012309:100016194; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1605; Backlog: 1/1000; Time: 4m35s(total), 4m8s(copy); streamer: master-bin.012309:100571337; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1617; Backlog: 2/1000; Time: 4m40s(total), 4m8s(copy); streamer: master-bin.012309:100977167; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1646; Backlog: 2/1000; Time: 4m45s(total), 4m8s(copy); streamer: master-bin.012309:101494538; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1667; Backlog: 2/1000; Time: 4m50s(total), 4m8s(copy); streamer: master-bin.012309:101928348; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1683; Backlog: 0/1000; Time: 4m55s(total), 4m8s(copy); streamer: master-bin.012309:102410611; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1715; Backlog: 1/1000; Time: 5m0s(total), 4m8s(copy); streamer: master-bin.012309:102967596; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1731; Backlog: 0/1000; Time: 5m5s(total), 4m8s(copy); streamer: master-bin.012309:103403985; State: postponing cut-over; ETA: due
2017/08/30 14:25:53 binlogsyncer.go:568: [info] rotate to (master-bin.012310, 4)
2017/08/30 14:25:53 binlogsyncer.go:568: [info] rotate to (master-bin.012310, 4)
2017-08-30 14:25:53 INFO rotate to next log name: master-bin.012310
2017-08-30 14:25:53 INFO rotate to next log name: master-bin.012310
Copy: 590352/590352 100.0%; Applied: 1841; Backlog: 9/1000; Time: 5m30s(total), 4m8s(copy); streamer: master-bin.012310:1112760; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 1964; Backlog: 0/1000; Time: 6m0s(total), 4m8s(copy); streamer: master-bin.012310:4118984; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 2115; Backlog: 5/1000; Time: 6m30s(total), 4m8s(copy); streamer: master-bin.012310:7140690; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 2242; Backlog: 0/1000; Time: 7m0s(total), 4m8s(copy); streamer: master-bin.012310:9875821; State: postponing cut-over; ETA: due
Copy: 590352/590352 100.0%; Applied: 2415; Backlog: 0/1000; Time: 7m30s(total), 4m8s(copy); streamer: master-bin.012310:12965823; State: postponing cut-over; ETA: due
2017-08-30 14:28:06 INFO Grabbing voluntary lock: gh-ost.26349918.lock
2017-08-30 14:28:06 INFO Setting LOCK timeout as 6 seconds
2017-08-30 14:28:06 INFO Looking for magic cut-over table
2017-08-30 14:28:06 INFO Creating magic cut-over table `mydb`.`_mytable_del`
2017-08-30 14:28:06 INFO Magic cut-over table created
2017-08-30 14:28:06 INFO Locking `mydb`.`mytable`, `mydb`.`_mytable_del`
2017-08-30 14:28:06 INFO Tables locked
2017-08-30 14:28:06 INFO Session locking original & magic tables is 26349918
2017-08-30 14:28:06 INFO Writing changelog state: AllEventsUpToLockProcessed:1504103286802312398
Copy: 590352/590352 100.0%; Applied: 2430; Backlog: 0/1000; Time: 7m35s(total), 4m8s(copy); streamer: master-bin.012310:13204869; State: migrating; ETA: due
# Migrating `mydb`.`mytable`; Ghost table is `mydb`.`_mytable_gho`
# Migrating mymaster.host:3306; inspecting myserver.host:3306; executing on myserver.host
# Migration started at Wed Aug 30 14:20:33 +0000 2017
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.mydb.mytable.sock
^C

@zmoazeni
Copy link
Contributor Author

zmoazeni commented Sep 5, 2017

We're going to re-attempt this same migration this week (possibly next) against the extra port. I'll report my experience here afterwards.

By the way, this issue doesn't need to remain open. It's not a bug or anything on gh-ost's side that I can see. I mainly opened it to share our experiences in case someone else in my scenario runs into this.

@zmoazeni
Copy link
Contributor Author

zmoazeni commented Sep 5, 2017

Actually, looks like I am going to need help after all. Doing some tests, it doesn't seem like I can instruct gh-ost to migrate mysql on a different port.

$ mysql -e "show global variables where variable_name in ('port', 'extra_port')"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| extra_port    | 23306 |
| port          | 3306  |
+---------------+-------+

/usr/local/bin/gh-ost \
--initially-drop-ghost-table \
--initially-drop-old-table \
--assume-rbr \
--port="23306" \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1500 \
--user="ghost" \
--ask-pass \
--database="mydb" \
--table="mytable" \
--verbose \
--alter="ADD mynewcol decimal(11,2) DEFAULT 0.0 NOT NULL" \
--exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
[--execute]
...
2017-09-05 16:48:53 FATAL Unexpected database port reported: 3306

This looks like it'll require a PR. https://github.com/github/gh-ost/search?utf8=%E2%9C%93&q=Unexpected+database+port+reported&type=

First, are you all ok with a PR allowing me to override the port and check the extra port? Second, do you have an suggestions/guidance on the design for that before I submit one?

@shlomi-noach
Copy link
Contributor

Sorry, just noting I'll be out for a few days and will look into this early next week.

@zmoazeni
Copy link
Contributor Author

zmoazeni commented Sep 7, 2017

No worries at all man. Take care.

@zmoazeni
Copy link
Contributor Author

I just wanted to give an update here. We have now had 3 successful migrations run with a build from my branch against the extra port. Including the original table that we had trouble with.

Now granted, they have all run in off-peak traffic hours, but we're rebuilding our confidence that gh-ost won't take us down again. I'm sure we'll run another gh-ost migration during prime time hours soon.

@zmoazeni
Copy link
Contributor Author

zmoazeni commented Sep 13, 2017

I didn't expect "soon" to be "today", but it was. And it was a smashing success.

We migrated two tables 4M and 11.5M rows. The second is arguably hotter than the one that gave us issues two weeks ago. Both migrated without a hitch using the extra port and my branch's build.

@shlomi-noach man, I gotta say. Thank you (and all the other contributors!!) for your hard work on gh-ost. It really is unlocking several hard-to-budge doors on our side.

@shlomi-noach
Copy link
Contributor

OK good, today is the day I got to look at this, and I'm glad this works for you! I reviewed and I'd like to suggest a slightly simplified approach that doesn't assume percona/mariadb.

@zmoazeni
Copy link
Contributor Author

I believe this can be closed. Gonna close to help clean up the open issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants