Support data skipping for Hudi connector#24784
Open
codope wants to merge 2 commits intoprestodb:masterfrom
Open
Support data skipping for Hudi connector#24784codope wants to merge 2 commits intoprestodb:masterfrom
codope wants to merge 2 commits intoprestodb:masterfrom
Conversation
Contributor
Author
|
@tdcmeehan @xiarixiaoyao @pratyakshsharma I have rewritten the data skipping support from #18606 with upgraded Hudi version. The design still remains the same as described in the original PR. However, in Hudi 0.15.0 we introduced HoodieStorage and HoodieStorageConfiguration and this patch works with those APIs. Please take a look. |
Contributor
|
Suggest rebasing could help with passing the tests that failed earlier. |
Co-authored-by: xiarixiaoyao <mengtao0326@qq.com>
Contributor
|
Thanks for the release note! Suggested changes: |
Contributor
|
Please resolve the file conflict. |
Contributor
|
Hi @codope, could you help look into the test failures? We can give this another round of review. |
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
HoodieStorageandHoodieStorageConfigurationAPIs.Motivation and Context
Hudi has a metadata table that supports efficient file listing, column stats and other indexes. Up until now, only
filesindex was integrated in the Hudi connector. This PR adds support forcolumn_statsindex as well.Impact
More efficient queries by data skipping on top of partition pruning.
Test Plan
Added tests to validate data skipping with Hudi table as test artifacts. Previously, we have run this through SSB benchmark in cluster. See the results in #18606
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.