Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enable upsert protobuf combination #17624

Merged
merged 4 commits into from
Jul 15, 2024
Merged

Conversation

tabVersion
Copy link
Contributor

@tabVersion tabVersion commented Jul 9, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

as title, we disabled the combination by mistake. Now re-enabling it

related doc pr https://github.com/risingwavelabs/risingwave-docs/pull/2354

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

allow create table with format upsert encode protobuf but do note it as NOT RECOMMENDED.

according to docs.confluent.io/platform/7.6/control-center/topics/schema.html

Do not use Protobuf messages and JSON objects for key values. Avro does not guarantee deterministic serialization for maps or arrays, and Protobuf and JSON schema formats do not guarantee deterministic serialization for any object. Using these formats for key values will break topic partitioning.

Users may experience the upsert in wrong order.

For example, upsert_1 and upsert_2 occurs accordingly but goes into different partitions due to the partition key issue. upsert_2 may be ingested earlier than upsert_1 so the result in RisingWave can be wrong.

Signed-off-by: tabVersion <[email protected]>
@tabVersion tabVersion requested review from xiangjinwu and xxchan July 9, 2024 06:19
@github-actions github-actions bot added the type/fix Bug fix label Jul 9, 2024
@xiangjinwu
Copy link
Contributor

we disabled the combination by mistake

It was not by mistake. We never implemented parsing kafka key as protobuf until we decided not to parse and always include key as an additional bytea column.

@xiangjinwu
Copy link
Contributor

Is it still draft? Any e2e test to make sure it works?

@tabVersion
Copy link
Contributor Author

we disabled the combination by mistake

It was not by mistake. We never implemented parsing kafka key as protobuf until we decided not to parse and always include key as an additional bytea column.

IIRC, we the prev behavior is fetching schema from both key and value, comparing them, and inferring the pk (aligning with upsert avro). After migrating to the additional columns, the upsert protobuf is disabled.

@tabVersion
Copy link
Contributor Author

Is it still draft? Any e2e test to make sure it works?

I am working on e2e test.

@xiangjinwu
Copy link
Contributor

After migrating to the additional columns, the upsert protobuf is disabled.

Before the migration, upsert protobuf did not exist either 😂 The previous behavior of fetching schema from both key and value was only implemented for avro

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Flink also don't support Upsert Protobuf. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/formats/overview/

So this might be non-trivial

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Jul 9, 2024

Note that Flink also don't support Upsert Protobuf. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/formats/overview/

So this might be non-trivial

Just my guess: https://docs.confluent.io/platform/7.6/control-center/topics/schema.html#c3-schemas-best-practices-key-value-pairs

Do not use Protobuf messages and JSON objects for key values. Avro does not guarantee deterministic serialization for maps or arrays, and Protobuf and JSON schema formats do not guarantee deterministic serialization for any object. Using these formats for key values will break topic partitioning.

@tabVersion tabVersion marked this pull request as ready for review July 14, 2024 07:22
@tabVersion
Copy link
Contributor Author

Just my guess: docs.confluent.io/platform/7.6/control-center/topics/schema.html

Do not use Protobuf messages and JSON objects for key values. Avro does not guarantee deterministic serialization for maps or arrays, and Protobuf and JSON schema formats do not guarantee deterministic serialization for any object. Using these formats for key values will break topic partitioning.

Agree, but the risk is not on our side. Maybe we can comment NOT RECOMMENDED on the doc.

e2e_test/source_inline/kafka/protobuf/basic.slt Outdated Show resolved Hide resolved
Comment on lines 799 to +803
// For all Upsert formats, we only accept one and only key column as primary key.
// Additional KEY columns must be set in this case and must be primary key.
(Format::Upsert, encode @ Encode::Json | encode @ Encode::Avro) => {
(
Format::Upsert,
encode @ Encode::Json | encode @ Encode::Avro | encode @ Encode::Protobuf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the comment above says all Upsert formats, shall we match (Format::Upsert, _) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense for all encodes other than Encode::Native but we may proceed with caution. Let's add more encodes here as requested.

@tabVersion tabVersion enabled auto-merge July 15, 2024 07:49
@tabVersion tabVersion added this pull request to the merge queue Jul 15, 2024
Merged via the queue into main with commit c186a8f Jul 15, 2024
31 of 32 checks passed
@tabVersion tabVersion deleted the tab/enable-upert-protobuf branch July 15, 2024 08:33
@tabVersion tabVersion added the need-cherry-pick-release-1.10 Open a cherry-pick PR to branch release-1.10 after the current PR is merged label Aug 27, 2024
tabVersion added a commit that referenced this pull request Aug 27, 2024
github-merge-queue bot pushed a commit that referenced this pull request Aug 28, 2024
zwang28 pushed a commit that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-cherry-pick-release-1.10 Open a cherry-pick PR to branch release-1.10 after the current PR is merged type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants