Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cdc): support sql server cdc #17429

Merged
merged 14 commits into from
Aug 7, 2024
Merged

feat(cdc): support sql server cdc #17429

merged 14 commits into from
Aug 7, 2024

Conversation

KeXiangWang
Copy link
Contributor

@KeXiangWang KeXiangWang commented Jun 24, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Support sql server cdc.

There're several potential issues in this implementation:

  1. The time precision is not sufficient (only 3 digit numbers after decimal point). It's Debezium's bug, and will be fixed in their 2.7.0+ release. We can wait for their fix.
  2. Have not check the timezone related issues.
  3. Currently, the conversion between sql server types and rw types are anonymous without a type specification. As we have multiple sql server types mapped into one rw type, it may have problems when trying to map it back to sql server types. This may need a refactorization of the tiberius library.
  4. tiberius, which is library that provides the client and type conversion functions, is now out of maintenance. The good news is that the library is not complicated.
  5. Both mysql and postgres support transactional CDC. Due to the design of debezium, for Sql Server CDC, it would come at the expense of the freshness of cdc sourcec. So in this implementation, Sql Server CDC is not transactional.
  6. There're still some types not supported.
  7. Sql server allows user to create multiple capture instances for a single table (similar to slot in postgres), but debezium does not allow user to specify which capture instance to track. We may need to add some constraints for this.
  8. More tests are needed:
  • Create source xxx (*)
  • Tests with larger scale data.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@KeXiangWang KeXiangWang requested a review from a team as a code owner June 24, 2024 19:50
@graphite-app graphite-app bot requested a review from a team June 24, 2024 19:50
@KeXiangWang KeXiangWang marked this pull request as draft June 25, 2024 18:46
@KeXiangWang KeXiangWang marked this pull request as ready for review June 25, 2024 19:39
@xxchan xxchan mentioned this pull request Jun 26, 2024
10 tasks
@KeXiangWang KeXiangWang force-pushed the wkx/sql-server-cdc branch 2 times, most recently from 7567d83 to 0c371b7 Compare July 2, 2024 22:32
xxchan
xxchan previously requested changes Jul 3, 2024
ci/scripts/e2e-paid-source-test.sh Outdated Show resolved Hide resolved
ci/scripts/e2e-paid-source-test.sh Outdated Show resolved Hide resolved
Comment on lines 34 to 38
echo "--- Install sql server client"
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
apt-get update -y
ACCEPT_EULA=Y DEBIAN_FRONTEND=noninteractive apt-get install -y mssql-tools unixodbc-dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider doing this in ci/Dockerfile.

ci/scripts/e2e-paid-source-test.sh Outdated Show resolved Hide resolved
@xxchan xxchan mentioned this pull request Jul 3, 2024
9 tasks
e2e_test/source/cdc_paid/cdc.load.slt Outdated Show resolved Hide resolved
e2e_test/source/cdc_paid/cdc.share_stream.slt Outdated Show resolved Hide resolved
e2e_test/source/cdc_paid/cdc.share_stream.slt Outdated Show resolved Hide resolved
e2e_test/source/cdc_paid/cdc.share_stream.slt Outdated Show resolved Hide resolved
src/connector/src/source/cdc/external/sql_server.rs Outdated Show resolved Hide resolved
src/connector/src/source/cdc/external/sql_server.rs Outdated Show resolved Hide resolved
@KeXiangWang
Copy link
Contributor Author

A short summary of required following up:

  1. Refactoring the test cases, following the inline test cases.
  2. Fulfill task id in connector metrics.
  3. Make sure the impl works well with uppercase table/schema/dbname

match item {
QueryItem::Metadata(meta) => {
for col in meta.columns() {
column_descs.push(ColumnDesc::named(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get these column meta from the INFORMATION_SCHEMA (in the following query)? Because our pg and mysql source all rely on the information in INFORMATION_SCHEMA to support the auto schema mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it later.

@graphite-app graphite-app bot requested a review from xxchan July 26, 2024 02:51
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we are going to made this paid feature, right? But it seems this is not gated yet (like #17683, #17795). This should be a blocker before this feature goes into 1.11.

@RickLeite
Copy link

Awesome 🚀

@KeXiangWang
Copy link
Contributor Author

Wait, we are going to made this paid feature, right? But it seems this is not gated yet (like #17683, #17795). This should be a blocker before this feature goes into 1.11.

Added

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others generally LGTM. Thank you for the hard work! (CDC business logic not reviewed)

ci/scripts/e2e-source-test.sh Outdated Show resolved Hide resolved
ci/scripts/e2e-source-test.sh Show resolved Hide resolved
Comment on lines 192 to 196
# ------------ kill cluster ------------
# system ok
# risedev kill

sleep 30s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention here? 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forget to uncomment them. It's for testing sql server cdc with recovery.

Copy link
Member

@xxchan xxchan Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we do lack recovery tests, but perhaps need to come up with a design.
related issue: #16356

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we cannot handle recovery with current cdc_inline design. The sqllogictest gets query failed: connection closed after the RW cluster's recovery. I'll comment them for now.

src/frontend/src/handler/create_table.rs Show resolved Hide resolved
src/common/src/types/datetime.rs Outdated Show resolved Hide resolved
src/connector/Cargo.toml Show resolved Hide resolved
Copy link
Contributor

@StrikeW StrikeW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM!

src/connector/src/sink/sqlserver.rs Outdated Show resolved Hide resolved
src/connector/src/source/cdc/external/sql_server.rs Outdated Show resolved Hide resolved
src/connector/src/source/reader/reader.rs Outdated Show resolved Hide resolved
@KeXiangWang KeXiangWang requested a review from a team August 7, 2024 02:24
Cargo.lock Show resolved Hide resolved
@xxchan xxchan enabled auto-merge August 7, 2024 02:43
@xxchan xxchan added this pull request to the merge queue Aug 7, 2024
@KeXiangWang KeXiangWang removed this pull request from the merge queue due to a manual request Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants