-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3539] Flink bucket index bucketID bootstrap optimization. #5093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@garyli1019 An improvement for HUDI-3315, please take a look. |
garyli1019
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a minor comment.
|
|
||
| if (bucketIndex.containsKey(partitionBucketId)) { | ||
| location = new HoodieRecordLocation("U", bucketIndex.get(partitionBucketId)); | ||
| if (incBucketIndex.contains(partitionBucketId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, a bug fixed here
...link-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/BucketStreamWriteFunction.java
Show resolved
Hide resolved
1571926 to
48807f4
Compare
|
@hudi-bot run azure |
1 similar comment
|
@hudi-bot run azure |
|
@minihippo would you resolve the conflict |
84fbc2e to
7d10408
Compare
|
@hudi-bot run azure |
|
|
||
| bootstrapIndexIfNeed(partition); | ||
| Map<Integer, String> bucketToFileIdMap = bucketIndex.get(partition); | ||
| final int bucketNum = BucketIdentifier.getBucketId(hoodieKey, indexKeyFields, this.bucketNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.get(partition) -> computeIfAbsent(partition, p -> new HashMap<>())
|
Thanks for the fix, i have fired a minor fix patch, can you apply it then, thanks ~ |
…he#5093) * [HUDI-3539] Flink bucket index bucketID bootstrap optimization. Co-authored-by: gengxiaoyu <[email protected]>
What is the purpose of the pull request
Optimization for bootstrap when use flink bucket index. Load and cache the filegroups info of a partition which poccessing the records belong to instead of loading all partitions at first.
Brief change log
Verify this pull request
This pull request is already covered by existing tests.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.