WIP: ApplySchema to run online schema changes via gh-ost#67
WIP: ApplySchema to run online schema changes via gh-ost#67shlomi-noach wants to merge 102 commits intomasterfrom
gh-ost#67Conversation
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Introduce a new Seconds type with explicit conversions to and from time.Duration. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
| curl -k -L https://github.com/github/gh-ost/releases/download/v1.0.49/gh-ost-binary-linux-20200209110835.tar.gz -o /tmp/gh-ost.tar.gz | ||
| (cd /tmp/ && tar xzf gh-ost.tar.gz) | ||
| cp /tmp/gh-ost /usr/bin | ||
|
|
There was a problem hiding this comment.
Just out of curiosity. I can see from security perspective, installing the binary on each tablet is preferred, what are the other tradeoffs to install the binary on each tablet vs remotely on a dedicated server/pod?
There was a problem hiding this comment.
Right; the above is just the local docker setup, and so does not represent a production environment. The above is also likely temporary. At any case, an important observation is that gh-ost reads data from MySQL and then mostly writes it back. It reads the data both by connecting into the replication stream as well as normal queries. What we've observed at GitHub is that latency between he machine where gh-ost runs, to the MySQL servers, is important. e.g. we would only run gh-ost in same DC as affected servers. But, I think taking it to the next level and running gh-ost on the very same tablet+MySQL master servers can be beneficial.
Another thing about running gh-ost from dedicated servers is the amount of migrations you'd be able to run concurrently. I don't have the numbers, but I guess if you'd want to run 100 concurrent migrations (say you have 100 shards), then I suspect running 100 gh-ost instances on same dedicated server is unlikely to perform well. Actual testing needed but that's my suspicion.
When you run gh-ost right on the master server, that problem implicitly doesn't exist.
| Keyspace string `json:"keyspace,omitempty"` | ||
| Table string `json:"table,omitempty"` | ||
| SQL string `json:"sql,omitempty"` | ||
| UUID string `json:"uuid,omitempty"` |
There was a problem hiding this comment.
will the UUID also be used as replica-server-id to support potential concurrent migration?
There was a problem hiding this comment.
with current design, each keyspace/shard will be handed this migration. All shards will use same UUID as the "migration job id". Perhaps I should rename the variable to JobID or something.
This will make it easier to investigate issues; if you have the UUID and you know all tables/shards use that same UUID, then all logs are in same place, called by same names, etc. on all tablets.
One key design is that a tablet/master will not run two concurrent migrations. Now, that's subtle. Running two long running migrations is known to be slower than running the two sequentially. However, sometimes one will be running a 3 day long migration, and then also want to ALTER a very small table, which only takes 2 minutes to run. That's a valid use case to running concurrently. Right now I'm not looking into that, and enforce serialization. If vitess is opinionated, and recommends a max 250-300GB of data per shard, then migration time is also capped to a reasonable timeframe (a few hours) and we can afford to serialize everything.
| OnlineDDLStatusRequested OnlineDDLStatus = "requested" | ||
| OnlineDDLStatusReviewed OnlineDDLStatus = "reviewed" | ||
| OnlineDDLStatusCancelled OnlineDDLStatus = "cancelled" | ||
| OnlineDDLStatusQueued OnlineDDLStatus = "queued" | ||
| OnlineDDLStatusReady OnlineDDLStatus = "ready" | ||
| OnlineDDLStatusRunning OnlineDDLStatus = "running" | ||
| OnlineDDLStatusComplete OnlineDDLStatus = "complete" | ||
| OnlineDDLStatusFailed OnlineDDLStatus = "failed" | ||
| ) |
There was a problem hiding this comment.
curious in the case gh-ost process hangs there in "processing" state, are we going to do proactively health checking and treat it as "failed" or?
There was a problem hiding this comment.
gh-ost supports hooks, which we utilize. One of those hooks is on-status, which fires every 1minute. You can use that as a liveness indicator.
Back when I worked on skeefree, I used it as a liveness/health indicator, such that if I haven't seen a report in past 10minutes, I assumed the migration to be dead.
In our current (still evolving) setup, things are somewhat simpler, because I know for fact that gh-ost runs on the master tablet's host, and so can further ask the tablet to communicate with it. That is, we will still use on-status, but then again, the tablet may also check that the gh-ost process is running (by communicating with the OS), or forcibly kill -9 it, etc.
| OnlineDDLStatusRequested OnlineDDLStatus = "requested" | ||
| OnlineDDLStatusReviewed OnlineDDLStatus = "reviewed" | ||
| OnlineDDLStatusCancelled OnlineDDLStatus = "cancelled" | ||
| OnlineDDLStatusQueued OnlineDDLStatus = "queued" |
There was a problem hiding this comment.
Nice to see the queueing feature! So it queues in global etcd?
There was a problem hiding this comment.
Both in global etcd as well as in local _vt.schema_migrations table. To be re-evaluated now that VExec is available. Gonna look into it.
go/vt/vttablet/onlineddl/executor.go
Outdated
| keyspace string | ||
| shard string | ||
|
|
||
| mu sync.Mutex |
There was a problem hiding this comment.
Renamed (guess I didn't push yet) to initMutex; to serialize the Open()/Close() flow; I mostly copied+pasted this logic from other Executor implementations.
go/vt/vttablet/onlineddl/executor.go
Outdated
| // Execute validates and runs a gh-ost process. | ||
| // Validation included testing the backend MySQL server and the gh-ost binray itself | ||
| // Execution runs first a dry run, then an actual migration | ||
| func (e *Executor) Execute(ctx context.Context, target querypb.Target, alias topodatapb.TabletAlias, schema, table, alter string) error { |
There was a problem hiding this comment.
Do we also want to check schema consistency (temp gh-ost tables), ongoing migrations (temp gh-ost flag files), and maybe replication health between master and relicas?
There was a problem hiding this comment.
-
replication health: definitely;
gh-ostuses throttling, and we will want to supplygh-ostwith the identity of a replica it will check for lag. Also, if we ever deploy/implementfrenoin Vitess, thengh-ostwill use that as a standard throttling mechanism. -
ongoing migrations: some work already in, possibly not pushed; as mentioned above we will avoid concurrent migrations and only allow one migration ta a time.
-
schema consistency: explain?
| fmt.Sprintf(`--port=%d`, mysqlPort), | ||
| `--user=gh-ost`, | ||
| `--password=gh-ost`, | ||
| `--allow-on-master`, |
There was a problem hiding this comment.
so we are going directly execute the osc on master. why not connect to a replica?
There was a problem hiding this comment.
Somewhat elaborating on the above; I began this way because of the simplicity and wanted to have a POC. Latency-wise, it's easier if we do that directly on master. The biggest advantage to running this on a replica is that we'd get implicit throttling by that replica. I will look into our options.
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
There has been a lot of progress on this PR, and in fact it has been refactored to such extent that little is left from the original commits. The design has changed (e.g. now using async, decoupled, scheduled migration execution, as opposed to sync execution), and many of the original TODOs or limitations have been addressed. I wish to open a new PR vs. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…plaintext Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
closed in favor of vitessio#6547 |
WORK IN PROGRESS
This PR begins to explore online schema changes via
gh-ost.NOTE: baseline for this branch has been refactored halfway, this PR is created with conflicts which will later be resolved.
Consider the below:
$ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global ApplySchema -online_schema_change -sql "alter table zzz modify id bigint not null" commerceNotice the new
-online_schema_changeflag.To date,
ApplySchemawould askvtctldto run a full blown schema change.vtctldwould:master/primaryfor each shardALTER TABLE...statement on each primaryWe wish to look into online schema changes. Online schema changes are non-blocking. They could run for hours, without affecting ongoing production traffic. Online schema changes are available via:
This PR offers an integration with
gh-ost. The initial submission has ALOT of assumptions and rough edges:gh-ostbinary exists in pathgh-ostuser account with proper privilegesgh-ostimmediately upon request. This should not be the general case: runninggh-ostshould be orchestrated (e.g. if there's already a runninggh-ostmigration, wait for it to complete ; e.g. ifgh-ostfails, retry N times?)gh-ostasynchronously but return no Job ID to trck progresgh-ostdoes not report yet success/failure to anyone/anythingAll the above need to be completed, hence this is a draft PR. So what this PR does have is mostly wiring.
vtctl ApplySchemato letvtctldit wants an online schema change via gRPCvtctldto intercept an online schema changevtctldto identify shards, primariesvtctlsto request online schema change fromvtabletsof primariesvttabletto analyze the request and spawngh-ostTo be continued.