-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3010] Unbundle parquet-avro and shade hbase in presto-bundle #4551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3010] Unbundle parquet-avro and shade hbase in presto-bundle #4551
Conversation
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@codope can you please help me understand how Presto would work w/o parquet-avro (which RecordReader's are relying on to read using ParquetFileReaders) and hbase-shaded-server (which carries HFile, and poised to fail if Metadata Table is enabled) |
parquet-avro and avro are part of hive-apache artifact in presto, so we are going to use that. I removed hbase-shaded-server as hbase-server (part of hudi-common) is already included. |
|
Oh, i missed that you removed the exclusions. |
|
@codope i had to revert these changes in my PR, since Presto queries are failing after rebase: |
@alexeykudinkin Let's not revert this. Instead, we should upgrade the presto version in hudi integ test. Currently, it is 0.217, over 3 years old which did not package avro.message. We want our bundles to be as lightweight as possible and so rely on deps provided by presto as much as possible. Moreover, 0.217 is far removed from the reality. It does not contain the hudi-specific changes that we did in Presto. Also, most Hudi users that I have interacted with are on 0.246 or later. |
|
That makes total sense to me. But for that we have to update the Docker images we're using in ITs, right? If'd revert those changes my PR would have ITs failing b/c of missing classes. Let me know when you'll be able to update Docker images and i'll revert the POM changes. EDIT Please keep in mind that Hive flows on the current master don't involve Metadata table (which uses HFile), and therefore we'd need to validate that it works either triggering that flow manually or basing it on top of #4556 which does trigger Metadata table usage in Hive flows. |
What is the purpose of the pull request
We introduced hudi-presto-bundle in the presto-hive module which caused some build issue in presto installation. See prestodb/presto#17164 for more details. This PR fixes the hudi-presto-bundle to solve that issue. Specifically:
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.