-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read API Source v2 #25392
Read API Source v2 #25392
Conversation
…o explain how OffsetBasedSource + RangeTracker is used.
R: @johnjcasey, @lukecwik, @kmjung |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
Outdated
Show resolved
Hide resolved
...google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
Show resolved
Hide resolved
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java
Outdated
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamBundleSource.java
Outdated
Show resolved
Hide resolved
R: @ahmedabu98 |
Retest this please. |
1 similar comment
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. You can ignore my logging suggestions if you like another alternative, but I think we should improve some of those.
Still haven't looked at test class, but submitting this review now to give you some time before next release cut.
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java
Outdated
Show resolved
Hide resolved
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java
Outdated
Show resolved
Hide resolved
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java
Outdated
Show resolved
Hide resolved
...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
Outdated
Show resolved
Hide resolved
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java
Outdated
Show resolved
Hide resolved
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamBundleSource.java
Outdated
Show resolved
Hide resolved
// of BigQueryStorageSourceBase, all splits here will be handled by `splitAtFraction()`. As a | ||
// result, this is a no-op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI because it was mentioned previously as TODO: Implement dynamic work rebalancing
, splitAtFraction() is dynamic work rebalancing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Hope these changes unblock users with large read operations
Failing test |
retest this please |
…o explain how OffsetBasedSource + RangeTracker is used.
…it test coverage.
…eadapi-v2 Merging latest changes from `upstream/master` to ensure Test runs.
…to readapi-v2 Merging latest changes from upstream.
Test suites aren't statusing, but they've all completed successfully except for https://ci-beam.apache.org/job/beam_PreCommit_Java_GCP_IO_Direct_Commit/1778/ (Java_GCP_IO_Direct) which is still running and seems stuck at the start |
Run Java_GCP_IO_Direct PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* ReadAPI Source v2 * Renamed the Source V2. Also added tests for the same. * v2 using OffsetBasedSource and OffsetBasedReader * Updating tests to have more sensible mock values. * Updated BqOption flag. * Simplifying `fractionConsumed` calculation. * Better variable names. * Minor refactoring. * Added a synchronized block in readNextRecord(). Also added comments to explain how OffsetBasedSource + RangeTracker is used. * Removed unnecessary synchronized block, added Javadoc and improved unit test coverage. * Consolidated code paths in `BigQueryIO` for Bundled and regular ReadAPI sources. * Lint fixes. * Minor Javadoc fix. * Fix StreamBundle creation logic and some minor code comment updates. * Updated logging. * Lint fixes. * ReadAPI Source v2 * Renamed the Source V2. Also added tests for the same. * v2 using OffsetBasedSource and OffsetBasedReader * Updating tests to have more sensible mock values. * Updated BqOption flag. * Simplifying `fractionConsumed` calculation. * Better variable names. * Minor refactoring. * Added a synchronized block in readNextRecord(). Also added comments to explain how OffsetBasedSource + RangeTracker is used. * Removed unnecessary synchronized block, added Javadoc and improved unit test coverage. * Consolidated code paths in `BigQueryIO` for Bundled and regular ReadAPI sources. * Lint fixes. * Minor Javadoc fix. * Fix StreamBundle creation logic and some minor code comment updates. * Updated logging. * Lint fixes.
This reverts commit 4ce8eed.
Adding the new Read API Source which supports bundling of Streams within Read Session. Customers can enable/disable this feature using the
setEnableBundling
BigQuery pipeline option (the option flag currently defaults to false).The design doc for the changes can be found here. This PR addresses #24260.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.