-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-1550] Incorrect query result for MOR table when merge base data… #2497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4a651d9 to
9bb0e27
Compare
|
@pengzhiwei2018 the build failure seems related. |
87d2bd2 to
7a86138
Compare
Yes, I have fix it now. |
|
cc @nsivabalan to also triage |
df3d85a to
66f0739
Compare
garyli1019
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pengzhiwei2018 thanks for your contribution, left some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are creating a new Properties in every call, can we put this outside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! I will refactor this code later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a new method initTableType to handle all the null being added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this import should be in the next group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preCombineFieldFromTableConfig sounds better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field does not exist, will this be an empty string or null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be null if it does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the HoodieTableConfig instead? or somehow translate all precombine field options into one place and deprecate others. Using the write option while reading sounds a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the PRECOMBINE_FIELD_OPT_KEY is used to compatible with the old table which has not store the preCombineField to hoodie.properties. Using the write option is a bit odd. So I have add a READ_PRECOMBINE_FIELD to the DataSourceReadOptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this field option instead of using null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
324489d to
7b3d36e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we think of introducing a builder pattern since we have too many overloaded constructors/initTableTypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garyli1019 @vinothchandar : I am sure this would have been brought up earlier too. Curious as to why we haven't exposed a builder for MetaClient instantiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found a jira for this topic https://issues.apache.org/jira/browse/HUDI-1315
Codecov Report
@@ Coverage Diff @@
## master #2497 +/- ##
============================================
+ Coverage 50.28% 50.53% +0.24%
- Complexity 3120 3123 +3
============================================
Files 430 430
Lines 19565 19597 +32
Branches 2004 2008 +4
============================================
+ Hits 9838 9903 +65
+ Misses 8924 8886 -38
- Partials 803 808 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
b62f1c3 to
143d036
Compare
garyli1019
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @pengzhiwei2018 , LGTM! @nsivabalan I agree we should use a builder pattern to init the table. Looks like @lw309637554 is working on the refactoring https://issues.apache.org/jira/browse/HUDI-1315.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
4891a86 to
2f23040
Compare
… with log remove unused import fix test case fix compile error fix compile error refactor the code format code format code add READ_PRE_COMBINE_FIELD config fix crash when PRECOMBINE_FIELD is not set add default method for initTableTypeWithBootstrap fix check style
thnx Gary. I have asked liwei. If he is not working on it as of now, probably I will take it up and put up a diff. |
hi @nsivabalan , do you think we can merge this PR and do the refactoring in a separate PR or on hold for now? WDYT? |
|
Yes, sounds good @garyli1019 . LGTM. Feel free to land it if you are good. |
|
@garyli1019 : I am good w/ the PR. Can you land it. Please fill in the right commit message while you squash & merge. |
… with log
Tips
What is the purpose of the pull request
Currently HoodieMergeOnReadRDD use the payload to combine the log data with the base data, However it does not pass the PRE_COMBINE_FILED to the payload class, which will result to incorrect query result.
This PR fix this by passthe PRE_COMBINE_FILED to the DefaultHoodieRecordPayload in HoodieMergeOnReadRDD#mergeRowWithLog.
Brief change log
preCombineFieldto thehoodie.properties. So we can get thepreCombineFieldfield when read the table. This add apreCombineFieldfield to theHoodieTableMetaClient#initTableTypemethod.preCombineFieldto theDefaultHoodieRecordPayloadinHoodieMergeOnReadRDD#mergeRowWithLogVerify this pull request
This change added tests and can be verified as follows:
TestMORDataSource#testPreCombineFiledForReadMORto verify the change.Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.