-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, with deps resolution #4695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pendency of hbase libs in hudi-common
… in hudi-io module
|
My approach is pulling the HFile format relevant classes from HBase repo with rel 2.4.9, into hudi repo A few things to finalize:
|
|
@yihua thanks for taking a stab at this.
Real issue with HFile usage in Hudi has been the bundling (shading and making the size smaller). HFile 2.x vs 1.x, its more about getting on a version that is not 5 years old :) . I don't think we saw any large perf improvements between 1.x and 2.x. I think even with the 1.x hbase we are on the ver 3 of HFile? (http://www.devdoc.net/bigdata/hbase-0.98.7-hadoop1/book/hfilev3.html , the HFile has its own version, like Hudi table version) @codope can chime in here as well. The urgency to do this stems from finalizing this before all the indexing work lands.
Need to take a closer look. if proto is used to define the storage format. may be we should keep it in? How big is that
right. the desired way for us is to trim the HFile to much much smaller amount of code even. We should not bring in any new dependencies that Hudi has gotten rid of - commons-lang, guava. Otherwise it defeats the purpose a little bit. |
|
at 66K lines, this is currently still too much code to maintain for return. Wondering if its easier to think about how we can have different base files supported within the same table/partition and punt this. We could just write our own format which can be lot thinner |
|
How much more of the code do we think we can trim (not just the deps) |
|
@yihua : Can we close this if not valid anymore. |
|
Per discussion, if we want to pull HFile related code into Hudi, there is more work to do to trim the code that's irrelevant, doing code rewrite inside classes, beyond just pulling in necessary classes, to bring LoC much lower than 66K. This direction is much more involved. For now, we'll go with the approach of upgrading HBase to 2.x and properly shading the dependencies, before we write our own file format for the same purpose. #5004 is ready for review for HBase upgrade to 2.x along this line. Closing this WIP PR. |
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.