-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[Review] refactor hudi-client #1727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@wangxianghu @leesf lets discuss on this PR.. its easy comment and iterate |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Midway through this.. But I did pull the code down ..
IIUC,
- HoodieEngineContext is very thin.. any place we really need a spark context, that code is now moved to a subclass under hudi-client-spark ..
- There are no API/functionality changes for Spark RDD client .. i.e old HoodieWriteClient..
This is a very good start... We can do it like this initially and later on, we try to abstract more and move more functionality back into the abstract classes.. As you will encounter when building hudi-client-flink out more full fledged, you will encounter code reuse for all teh table.action package code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use sl4j currently.. there was a separate effort for this..
@leesf would know better.. for now, let's keep things in log4j
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vinothchandar, Thanks for feedback!
yes, HoodieEngineContext is thin, It holds only common things, while spark related goes to HoodieSparkEngineContext, flink related goes to HoodieFlinkEngineContext... which both extends HoodieEngineContext .
As it is already huge, We don't want to make too many changes. So we made no API/functionality changes for Spark RDD client, just abstracted it. BTW, I have verified it in flink engine before(replace RDD with List), it is doable.
I'll roll back the log with log4j.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please document what I,K,O,P stand for>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please document what I,K,O,P stand for>
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably keep IndexType in HoodieIndex itself.. as a interface? anyways, I may have some detailed comments like these.. but we can defer them to final review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull this into a helper like HoodieSparkEngineContext.getSparkContext(engineCtx) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull this into a helper like
HoodieSparkEngineContext.getSparkContext(engineCtx)?
will do
|
@leesf @wangxianghu Direction is definitely promising.. and very clean.. Let me know if you want a detailed line-by-line review also ccing @yanghua @bvaradar @n3nash to be in the know.. This is huge! |
|
We may have to coordinate with the bootstrap pr bit more on conflicts/rebasng, so that either of your life is not hell :) |
Ack, will review this weekend. |
Sorry, recently, I am busy with other things. Will try to catch your thoughts and review later. |
@vinothchandar thanks for your affirmation. I'll try to finish the abstraction this weekend, then implement it with spark engine. I'll ping you when it is ready. |
will keep an eye on the bootstrap pr, thanks for reminding :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the section please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid using *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid using *
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are methods from HoodieWriteClient, and users would use HoodieWriteClient to upsert/insert records directly using the APIs, right now the HoodieWriteClient has been removed, so it breaks the compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are methods from HoodieWriteClient, and users would use HoodieWriteClient to upsert/insert records directly using the APIs, right now the HoodieWriteClient has been removed, so it breaks the compatibility.
@leesf Yes, it is not finished yet. I have noticed that HoodieWriteClient has been referenced in many places(eg hudi-cli,hudi-utilities...). When hudi-client module is ready, the other modules which rely on hudi-client should make appropriate changes to adapt to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditoo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangxianghu another minor thing.. we generally don't do @author and other headers in source (we have git blame already) .. so may be revert that in all the files as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangxianghu another minor thing.. we generally don't do
@authorand other headers in source (we have git blame already) .. so may be revert that in all the files as well?
@vinothchandar, yes, it was generated by idea automatically and I have deleted it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the import reordered by idea? I found some files just change the import order, would we keep the same as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the import reordered by idea? I found some files just change the import order, would we keep the same as before?
yes, idea ordered it. I will rollback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @leesf, Thanks for your detailed review. this branch is not ready for review line-by-line yet :), it aims to show you the structure of the new abstraction, I will take care of all the details you mentioned.
fafd556 to
21b92d9
Compare
7b85e0e to
025ca63
Compare
7b6c8bb to
488525f
Compare
|
@vinothchandar @smarthi @vinothchandar This PR is ready for review, please take a look when free. |
08651a5 to
5f96e83
Compare
26611c4 to
55e1af2
Compare
|
refactor is finished, review goes to #1827 |
|
all goes to #1827 Closing this one. |

Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.