-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Improve][Connector-V2][HBase] Support configurable range scan boundary inclusion policies #10011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…ry inclusion policies
…ry inclusion policies
…ry inclusion policies
davidzollo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| .withDescription( | ||
| "Whether to include the start row in the scan. Default is true (inclusive)."); | ||
|
|
||
| public static final Option<Boolean> END_ROW_INCLUSIVE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to maintain consistency with the previous logic, i think the default value of end_row_inclusive should also be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @xiaochen-zhou suggest, to ensure that data can be read normally without duplication, we need to include only one side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi,Will the data be duplicated when both END_ROW_INCLUSIVE and START_ROW_INCLUSIVE are true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intended result of the configuration should be to control whether the boundaries are included. However, I believe that when both sides are set to false, it will result in shards like (start, 1), (1, 3), (3, end), which causes the loss of boundary data. Conversely, it will lead to duplicate boundary data. Instead of adjusting the boundaries of each split, we should only modify the inclusion settings of the start and end points
Purpose of this pull request
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide