Skip to content

Conversation

@vinothchandar
Copy link
Member

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@vinothchandar
Copy link
Member Author

@wangxianghu @leesf lets discuss on this PR.. its easy comment and iterate

Copy link
Member Author

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Midway through this.. But I did pull the code down ..

IIUC,

  • HoodieEngineContext is very thin.. any place we really need a spark context, that code is now moved to a subclass under hudi-client-spark ..
  • There are no API/functionality changes for Spark RDD client .. i.e old HoodieWriteClient..

This is a very good start... We can do it like this initially and later on, we try to abstract more and move more functionality back into the abstract classes.. As you will encounter when building hudi-client-flink out more full fledged, you will encounter code reuse for all teh table.action package code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use sl4j currently.. there was a separate effort for this..
@leesf would know better.. for now, let's keep things in log4j

Copy link
Contributor

@wangxianghu wangxianghu Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vinothchandar, Thanks for feedback!
yes, HoodieEngineContext is thin, It holds only common things, while spark related goes to HoodieSparkEngineContext, flink related goes to HoodieFlinkEngineContext... which both extends HoodieEngineContext .

As it is already huge, We don't want to make too many changes. So we made no API/functionality changes for Spark RDD client, just abstracted it. BTW, I have verified it in flink engine before(replace RDD with List), it is doable.

I'll roll back the log with log4j.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document what I,K,O,P stand for>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document what I,K,O,P stand for>

will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably keep IndexType in HoodieIndex itself.. as a interface? anyways, I may have some detailed comments like these.. but we can defer them to final review

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull this into a helper like HoodieSparkEngineContext.getSparkContext(engineCtx) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull this into a helper like HoodieSparkEngineContext.getSparkContext(engineCtx) ?

will do

@vinothchandar
Copy link
Member Author

@leesf @wangxianghu Direction is definitely promising.. and very clean.. Let me know if you want a detailed line-by-line review

also ccing @yanghua @bvaradar @n3nash to be in the know.. This is huge!

@vinothchandar
Copy link
Member Author

We may have to coordinate with the bootstrap pr bit more on conflicts/rebasng, so that either of your life is not hell :)

@vinothchandar vinothchandar self-assigned this Jun 11, 2020
@leesf
Copy link
Contributor

leesf commented Jun 12, 2020

@leesf @wangxianghu Direction is definitely promising.. and very clean.. Let me know if you want a detailed line-by-line review

also ccing @yanghua @bvaradar @n3nash to be in the know.. This is huge!

Ack, will review this weekend.

@yanghua
Copy link
Contributor

yanghua commented Jun 12, 2020

@leesf @wangxianghu Direction is definitely promising.. and very clean.. Let me know if you want a detailed line-by-line review

also ccing @yanghua @bvaradar @n3nash to be in the know.. This is huge!

Sorry, recently, I am busy with other things. Will try to catch your thoughts and review later.

@wangxianghu
Copy link
Contributor

@leesf @wangxianghu Direction is definitely promising.. and very clean.. Let me know if you want a detailed line-by-line review

also ccing @yanghua @bvaradar @n3nash to be in the know.. This is huge!

@vinothchandar thanks for your affirmation. I'll try to finish the abstraction this weekend, then implement it with spark engine. I'll ping you when it is ready.

@wangxianghu
Copy link
Contributor

We may have to coordinate with the bootstrap pr bit more on conflicts/rebasng, so that either of your life is not hell :)

will keep an eye on the bootstrap pr, thanks for reminding :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to protected?

Comment on lines 37 to 39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the section please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using *

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using *

sure

Comment on lines 86 to 167
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are methods from HoodieWriteClient, and users would use HoodieWriteClient to upsert/insert records directly using the APIs, right now the HoodieWriteClient has been removed, so it breaks the compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are methods from HoodieWriteClient, and users would use HoodieWriteClient to upsert/insert records directly using the APIs, right now the HoodieWriteClient has been removed, so it breaks the compatibility.

@leesf Yes, it is not finished yet. I have noticed that HoodieWriteClient has been referenced in many places(eg hudi-cli,hudi-utilities...). When hudi-client module is ready, the other modules which rely on hudi-client should make appropriate changes to adapt to.

Comment on lines 6 to 26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditoo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

Comment on lines 11 to 31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 35 to 55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxianghu another minor thing.. we generally don't do @author and other headers in source (we have git blame already) .. so may be revert that in all the files as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxianghu another minor thing.. we generally don't do @author and other headers in source (we have git blame already) .. so may be revert that in all the files as well?

@vinothchandar, yes, it was generated by idea automatically and I have deleted it.

Comment on lines 25 to 27
Copy link
Contributor

@leesf leesf Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the import reordered by idea? I found some files just change the import order, would we keep the same as before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the import reordered by idea? I found some files just change the import order, would we keep the same as before?

yes, idea ordered it. I will rollback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 7 to 29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 41 to 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 16 to 37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 55 to 77
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 1 to 19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leesf, Thanks for your detailed review. this branch is not ready for review line-by-line yet :), it aims to show you the structure of the new abstraction, I will take care of all the details you mentioned.

@wangxianghu wangxianghu force-pushed the HUDI-xxx branch 2 times, most recently from fafd556 to 21b92d9 Compare June 14, 2020 02:57
@wangxianghu wangxianghu force-pushed the HUDI-xxx branch 4 times, most recently from 7b85e0e to 025ca63 Compare July 5, 2020 11:17
@wangxianghu wangxianghu force-pushed the HUDI-xxx branch 2 times, most recently from 7b6c8bb to 488525f Compare July 9, 2020 13:10
@leesf leesf marked this pull request as ready for review July 9, 2020 13:25
@leesf
Copy link
Contributor

leesf commented Jul 9, 2020

@vinothchandar @smarthi @vinothchandar This PR is ready for review, please take a look when free.

@leesf leesf changed the title [WIP] [Review] refactor hudi-client [Review] refactor hudi-client Jul 10, 2020
@wangxianghu wangxianghu force-pushed the HUDI-xxx branch 3 times, most recently from 08651a5 to 5f96e83 Compare July 11, 2020 07:29
@wangxianghu
Copy link
Contributor

wangxianghu commented Jul 13, 2020

image
It is strange, Both these two unit tests run correctly in my local environment.
@yanghua @leesf would you please take a look at this :)

@leesf leesf removed the status:in-progress Work in progress label Jul 13, 2020
@wangxianghu wangxianghu force-pushed the HUDI-xxx branch 4 times, most recently from 26611c4 to 55e1af2 Compare July 13, 2020 15:43
@wangxianghu
Copy link
Contributor

wangxianghu commented Jul 14, 2020

refactor is finished, review goes to #1827

@leesf
Copy link
Contributor

leesf commented Jul 17, 2020

all goes to #1827 Closing this one.

@leesf leesf closed this Jul 17, 2020
@wangxianghu wangxianghu deleted the HUDI-xxx branch August 28, 2020 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants