Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: vitess Online DDL atomic cut-over #11418

Conversation

shlomi-noach
Copy link
Contributor

Description

This is work in progress for an atomic RENAME (not two-step gaping hole) in the cut-over process of a vitess/vreplication Online DDL migration.

Curently, we run a two-step rename, when the original table is renamed away, and then a 2nd rename moves the vreplicatoin table in its place. This is protected under a buffering rule on the primary, but replicas see two distinct renames and there is a point in time on the replica, where the table does not exist. Of course this means queries are failing.
There is also a scenario depicted in #11226, where even queries against the primary may fail.

This PR attempts to introduce an algorithm that is similar to the gh-ost cut-over. It is simplified, because the buffering rule gives us some safety net.

Still work in progress as I/m seeing some flunctuations.

Related Issue(s)

#6926

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels Oct 2, 2022
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 2, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@shlomi-noach
Copy link
Contributor Author

So far no good results. Hammering the branch against onlineddl_vrepl_stress, with increased number of tests, there's always a failing test. I've tried many variations thus far.
I'm going to actually step back and validate that main itself passed the increased volume of tests.

@shlomi-noach
Copy link
Contributor Author

Superseded by #11460, which takes a stricter and safer approach.

@shlomi-noach shlomi-noach deleted the vrepl-online-ddl-cut-over-atomic-without-sentry branch October 11, 2022 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant