WIP: ApplySchema to run online schema changes via `gh-ost` by shlomi-noach · Pull Request #67 · planetscale/vitess

shlomi-noach · 2020-07-26T08:33:14Z

WORK IN PROGRESS

This PR begins to explore online schema changes via gh-ost.

NOTE: baseline for this branch has been refactored halfway, this PR is created with conflicts which will later be resolved.

Consider the below:

$ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global ApplySchema -online_schema_change -sql "alter table zzz modify id bigint not null" commerce

Notice the new -online_schema_change flag.

To date, ApplySchema would ask vtctld to run a full blown schema change. vtctld would:

Identify the shards
Identify master/primary for each shard
Establish MySQL connection on each primary
Run ALTER TABLE... statement on each primary
return when all are complete.

We wish to look into online schema changes. Online schema changes are non-blocking. They could run for hours, without affecting ongoing production traffic. Online schema changes are available via:

This PR offers an integration with gh-ost. The initial submission has ALOT of assumptions and rough edges:

That the gh-ost binary exists in path
That the MySQL servers have a gh-ost user account with proper privileges
We run gh-ost immediately upon request. This should not be the general case: running gh-ost should be orchestrated (e.g. if there's already a running gh-ost migration, wait for it to complete ; e.g. if gh-ost fails, retry N times?)
We run gh-ost asynchronously but return no Job ID to trck progres
gh-ost does not report yet success/failure to anyone/anything
There's no way to check progress

All the above need to be completed, hence this is a draft PR. So what this PR does have is mostly wiring.

vtctl ApplySchema to let vtctld it wants an online schema change via gRPC
vtctld to intercept an online schema change
vtctld to identify shards, primaries
vtctls to request online schema change from vtablets of primaries
vttablet to analyze the request and spawn gh-ost

To be continued.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

Introduce a new Seconds type with explicit conversions to and from time.Duration. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

aidendou2019 · 2020-08-03T18:52:04Z

docker/local/install_local_dependencies.sh

+curl -k -L https://github.com/github/gh-ost/releases/download/v1.0.49/gh-ost-binary-linux-20200209110835.tar.gz -o /tmp/gh-ost.tar.gz
+(cd /tmp/ && tar xzf gh-ost.tar.gz)
+cp /tmp/gh-ost /usr/bin
+


Just out of curiosity. I can see from security perspective, installing the binary on each tablet is preferred, what are the other tradeoffs to install the binary on each tablet vs remotely on a dedicated server/pod?

Right; the above is just the local docker setup, and so does not represent a production environment. The above is also likely temporary. At any case, an important observation is that gh-ost reads data from MySQL and then mostly writes it back. It reads the data both by connecting into the replication stream as well as normal queries. What we've observed at GitHub is that latency between he machine where gh-ost runs, to the MySQL servers, is important. e.g. we would only run gh-ost in same DC as affected servers. But, I think taking it to the next level and running gh-ost on the very same tablet+MySQL master servers can be beneficial.
Another thing about running gh-ost from dedicated servers is the amount of migrations you'd be able to run concurrently. I don't have the numbers, but I guess if you'd want to run 100 concurrent migrations (say you have 100 shards), then I suspect running 100 gh-ost instances on same dedicated server is unlikely to perform well. Actual testing needed but that's my suspicion.
When you run gh-ost right on the master server, that problem implicitly doesn't exist.

aidendou2019 · 2020-08-04T17:18:16Z

go/vt/schema/online_ddl.go

+	Keyspace    string          `json:"keyspace,omitempty"`
+	Table       string          `json:"table,omitempty"`
+	SQL         string          `json:"sql,omitempty"`
+	UUID        string          `json:"uuid,omitempty"`


will the UUID also be used as replica-server-id to support potential concurrent migration?

with current design, each keyspace/shard will be handed this migration. All shards will use same UUID as the "migration job id". Perhaps I should rename the variable to JobID or something.
This will make it easier to investigate issues; if you have the UUID and you know all tables/shards use that same UUID, then all logs are in same place, called by same names, etc. on all tablets.

One key design is that a tablet/master will not run two concurrent migrations. Now, that's subtle. Running two long running migrations is known to be slower than running the two sequentially. However, sometimes one will be running a 3 day long migration, and then also want to ALTER a very small table, which only takes 2 minutes to run. That's a valid use case to running concurrently. Right now I'm not looking into that, and enforce serialization. If vitess is opinionated, and recommends a max 250-300GB of data per shard, then migration time is also capped to a reasonable timeframe (a few hours) and we can afford to serialize everything.

aidendou2019 · 2020-08-04T17:27:11Z

go/vt/schema/online_ddl.go

+	OnlineDDLStatusRequested OnlineDDLStatus = "requested"
+	OnlineDDLStatusReviewed  OnlineDDLStatus = "reviewed"
+	OnlineDDLStatusCancelled OnlineDDLStatus = "cancelled"
+	OnlineDDLStatusQueued    OnlineDDLStatus = "queued"
+	OnlineDDLStatusReady     OnlineDDLStatus = "ready"
+	OnlineDDLStatusRunning   OnlineDDLStatus = "running"
+	OnlineDDLStatusComplete  OnlineDDLStatus = "complete"
+	OnlineDDLStatusFailed    OnlineDDLStatus = "failed"
+)


curious in the case gh-ost process hangs there in "processing" state, are we going to do proactively health checking and treat it as "failed" or?

gh-ost supports hooks, which we utilize. One of those hooks is on-status, which fires every 1minute. You can use that as a liveness indicator.
Back when I worked on skeefree, I used it as a liveness/health indicator, such that if I haven't seen a report in past 10minutes, I assumed the migration to be dead.

In our current (still evolving) setup, things are somewhat simpler, because I know for fact that gh-ost runs on the master tablet's host, and so can further ask the tablet to communicate with it. That is, we will still use on-status, but then again, the tablet may also check that the gh-ost process is running (by communicating with the OS), or forcibly kill -9 it, etc.

aidendou2019 · 2020-08-04T17:35:22Z

go/vt/schema/online_ddl.go

+	OnlineDDLStatusRequested OnlineDDLStatus = "requested"
+	OnlineDDLStatusReviewed  OnlineDDLStatus = "reviewed"
+	OnlineDDLStatusCancelled OnlineDDLStatus = "cancelled"
+	OnlineDDLStatusQueued    OnlineDDLStatus = "queued"


Nice to see the queueing feature! So it queues in global etcd?

Both in global etcd as well as in local _vt.schema_migrations table. To be re-evaluated now that VExec is available. Gonna look into it.

aidendou2019 · 2020-08-04T17:58:06Z

go/vt/vttablet/onlineddl/executor.go

+	keyspace string
+	shard    string
+
+	mu     sync.Mutex


what does the mutex prevent here?

Renamed (guess I didn't push yet) to initMutex; to serialize the Open()/Close() flow; I mostly copied+pasted this logic from other Executor implementations.

aidendou2019 · 2020-08-04T18:00:39Z

go/vt/vttablet/onlineddl/executor.go

+// Execute validates and runs a gh-ost process.
+// Validation included testing the backend MySQL server and the gh-ost binray itself
+// Execution runs first a dry run, then an actual migration
+func (e *Executor) Execute(ctx context.Context, target querypb.Target, alias topodatapb.TabletAlias, schema, table, alter string) error {


Do we also want to check schema consistency (temp gh-ost tables), ongoing migrations (temp gh-ost flag files), and maybe replication health between master and relicas?

replication health: definitely; gh-ost uses throttling, and we will want to supply gh-ost with the identity of a replica it will check for lag. Also, if we ever deploy/implement freno in Vitess, then gh-ost will use that as a standard throttling mechanism.

ongoing migrations: some work already in, possibly not pushed; as mentioned above we will avoid concurrent migrations and only allow one migration ta a time.

schema consistency: explain?

aidendou2019 · 2020-08-04T18:04:35Z

go/vt/vttablet/onlineddl/executor.go

+				fmt.Sprintf(`--port=%d`, mysqlPort),
+				`--user=gh-ost`,
+				`--password=gh-ost`,
+				`--allow-on-master`,


so we are going directly execute the osc on master. why not connect to a replica?

Somewhat elaborating on the above; I began this way because of the simplicity and wanted to have a POC. Latency-wise, it's easier if we do that directly on master. The biggest advantage to running this on a replica is that we'd get implicit throttling by that replica. I will look into our options.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2020-08-06T08:27:50Z

There has been a lot of progress on this PR, and in fact it has been refactored to such extent that little is left from the original commits. The design has changed (e.g. now using async, decoupled, scheduled migration execution, as opposed to sync execution), and many of the original TODOs or limitations have been addressed.

I wish to open a new PR vs. vitessio/vitess rather than continue updating/commenting here, given all the changes. If that's acceptable, I'll proceed to do that, and I'll make sure to point back to this PR for documentation/progress reference.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

…plaintext Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach · 2020-08-09T13:14:28Z

closed in favor of vitessio#6547

sougou added 30 commits July 19, 2020 22:47

vttablet: stateManager tests WIP

f287663

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

tabletserver: create new ReplTracker skeleton

5b99487

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: move hc flags to tabletenv.TabletConfig

2a97a0e

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: type safe Seconds instead of float64

4b08c81

Introduce a new Seconds type with explicit conversions to and from time.Duration. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: enable_replication_reporter -> tabletenv

3af6df5

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: saving work. need mysqld in tabletserver

6d65f9f

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: add mysqld to tabletserver

4776c74

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: ReplTracker functional

7098531

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: switch to use replTracker

cddd0de

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: tests for ReplTracker

52d6b06

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: new healthStreamer

fff8340

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: wire-up most of healthStreamer

e1a1170

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: streamHealth wired up

addfd0e

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: status page uses healthStreamer

020b2e7

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: VREngine retries if Open fails

9dee129

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

tm: preparing to delete healthcheck

5f968fc

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: trying a different heathcheck

b4f2207

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: delete old broken health tests

efcf8b3

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: make BACKUP StateNotConnected

290ea36

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: fix tabletserver repl health & tests

2f22ad3

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: relpManager initial code

ea790b5

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: fix tabletmanager tests

109aab6

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: fix more tests

6dd319d

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: replManager unit tests

ff9dd14

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: delete deprecated code

30f98a8

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: fix after rebase

e8dba28

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: move gracePeriod to TabletConfig

42fa9ad

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: SetServingType handles alsoAllow

c3bb83e

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: tmState initial cut

8e23766

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

vttablet: use tmState

453497f

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>

shlomi-noach added 4 commits August 4, 2020 11:00

fixed merge conflict

c551a03

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

vtctld runs schema checks

2741151

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

reporting back UUID in response to ALTER TABLE statement

225bc3b

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

/schema-migration/report-status API endpoint

d13b41b

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

aidendou2019 reviewed Aug 4, 2020

View reviewed changes

shlomi-noach added 16 commits August 5, 2020 12:38

Using latest gh-ost release from openark org

cafb9c1

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

sqltypes.Result supports NamedResults, via Named() function

079a987

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

GetInt64(), GetUint64() convenience methods

3393fa5

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Row() convenience method

2336eac

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

convenience methods in RowNamedValues

115de10

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

ToBool() conveniene method

759e76d

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

cleanup, undo adding fields to proto.SchemaChange

718e890

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

cleanup, undo adding fields to proto.SchemaChange

fbd2457

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Cleanup, undo and remove logic from tablet manager path

c78021c

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Cleanup, undo and remove logic from tablet manager path

142f06f

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

while still WIP, further logic into online ddl executor

e87faa5

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

updating migration status and timestamp in response to API call

5029bd7

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

minor cleanup and rename

bb4b328

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

cleanup: remove unused code

8481c3c

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Autogenerate gh-ost account and privileges

2b075df

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

no need for gh-ost account in config/rice

de9b561

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach added 4 commits August 6, 2020 11:35

remove debug info

a8ae617

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

remove debug info

6a37b43

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

gh-ost password stored in environment variable, no longer visible as …

23b60b5

…plaintext Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

provide gh-ost with DBName

0822dd0

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

shlomi-noach closed this Aug 9, 2020

shlomi-noach mentioned this pull request Aug 9, 2020

Experimental: automated, scheduled, dependency free online DDL via gh-ost/pt-online-schema-change vitessio/vitess#6547

Merged

48 tasks

shlomi-noach deleted the pov-gh-ost-tablet-rewrite2 branch August 9, 2020 13:14

Conversation

shlomi-noach commented Jul 26, 2020

WORK IN PROGRESS

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shlomi-noach commented Aug 6, 2020

Uh oh!

shlomi-noach commented Aug 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants