Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loqrecovery: support mixed version recovery #96811

Merged
merged 1 commit into from
Mar 16, 2023

Conversation

aliher1911
Copy link
Contributor

@aliher1911 aliher1911 commented Feb 8, 2023

This commit adds mixed version support for half-online loss of quorum recovery service and cli tools.
This change would allow user to use loq recovery in partially upgraded clusters by tracking version that generated data and produce recovery plans which will have identical version so that versions could be verified on all steps of recovery.
General rule is you can use data from the cluster that is not newer than a binary version to avoid new information being dropped. This rule applies to planning process where planner should understand replica info and also to cockroach node that applies the plan, which should be created by equal or lower version. Additional restriction is on planner to preserve version in the plan and don't use any new features if processed info is older than the binary version. This is no different on what version gates do in cockroach.

Release note: None

Fixes #95344

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aliher1911 aliher1911 force-pushed the loq_05online_stage_binary branch 6 times, most recently from 6331b93 to c328ce2 Compare February 9, 2023 14:07
@aliher1911 aliher1911 self-assigned this Feb 9, 2023
@aliher1911 aliher1911 force-pushed the loq_05online_stage_binary branch 5 times, most recently from e5a0087 to c8de12d Compare February 15, 2023 15:19
@aliher1911 aliher1911 marked this pull request as ready for review February 15, 2023 17:29
@aliher1911 aliher1911 requested review from a team as code owners February 15, 2023 17:29
@aliher1911 aliher1911 requested review from a team February 15, 2023 17:29
@aliher1911 aliher1911 requested review from a team as code owners February 15, 2023 17:29
@aliher1911 aliher1911 requested review from erikgrinaker and a team and removed request for a team February 15, 2023 17:29
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clarify some of the semantics here before I do a complete review. See comments.

Beside the comments, currently a 23.1 binary in a mixed 22.2/23.1 cluster will generate a plan that's incompatible with a 22.2 cluster right, because 22.2 used a slightly different structure for the collected info? We should make sure we generate something compatible with the released 22.2 versions in these cases.

We also need to take care with the JSON (un)marshalling. There are several pitfalls here:

  • If I'm understanding it correctly, the JSON marshaller will currently choke on unknown fields. How do we ensure that a future binary won't emit fields that an older binary will choke on? Is EmitDefaults: false sufficient?

  • Unlike Protobuf, we must guarantee stability of the field names -- it's easy for someone to miss this and think that changing a field name is ok. At the very least, we need tests that ensure JSON encoding stability, comments on relevant structs.

  • Similarly, we use EnumsAsInts: false, which means that unlike the Protobuf protocol we can't change enum names without breaking backwards compatibility.

This only matters for replica info collection and plan generation locally on the client, right? Let's make sure we're certain what the implications are for these and their included messages.

pkg/kv/kvserver/loqrecovery/collect.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/collect.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/collect.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/server.go Outdated Show resolved Hide resolved
@aliher1911 aliher1911 force-pushed the loq_05online_stage_binary branch 4 times, most recently from d163406 to ecdbbd7 Compare February 24, 2023 10:22
@aliher1911
Copy link
Contributor Author

I put comments about our versioning approach to version.go where version validation functions are as it seems better place than scattering across the places where they are used.

So we will maintain old json format for clusters that are prior to 23.1.

There are some workarounds to make things works on master which is 22.2. until it is released. Otherwise master will work like a "legacy" and only use old format and features. Those bits could go away after branch is cut.

@aliher1911 aliher1911 force-pushed the loq_05online_stage_binary branch 2 times, most recently from ce64aac to 0abce85 Compare February 24, 2023 13:18
@aliher1911 aliher1911 requested a review from a team February 24, 2023 17:51
@aliher1911 aliher1911 force-pushed the loq_05online_stage_binary branch 3 times, most recently from 8d8acea to 6dbbb02 Compare March 3, 2023 12:06
@aliher1911
Copy link
Contributor Author

Changed to used standard version gate throughout.

One thing that I think is different with CLI vs intra cluster rpc's is that cluster won't version gate data if CLI has lower version. The RPC will still allow cli to talk to cluster if cli is above min version. This is not enough for us when collecting data, so I kept explicit version check that data returned by cluster is compatible with CLI.

@aliher1911 aliher1911 force-pushed the loq_05online_stage_binary branch 2 times, most recently from 5d31760 to 1da29fa Compare March 10, 2023 09:43
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this ended up being pretty clean.

pkg/kv/kvserver/loqrecovery/version.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/version.go Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/version.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/marshalling.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/collect.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/loqrecovery/apply.go Outdated Show resolved Hide resolved
@aliher1911 aliher1911 force-pushed the loq_05online_stage_binary branch 8 times, most recently from 2e205f3 to 42132a9 Compare March 16, 2023 12:33
This commit adds mixed version support for half-online loss of quorum
recovery service and cli tools.
This change would allow user to use loq recovery in partially
upgraded clusters by tracking version that generated data and
produce recovery plans which will have identical version so
that versions could be verified on all steps of recovery.

Release note: None
@aliher1911
Copy link
Contributor Author

bors r=erikgrinaker

@craig
Copy link
Contributor

craig bot commented Mar 16, 2023

Build succeeded:

@craig craig bot merged commit 898a32a into cockroachdb:master Mar 16, 2023
@aliher1911 aliher1911 deleted the loq_05online_stage_binary branch March 17, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

loqrecovery: support mixed version clusters in loss of quorum recovery
3 participants