Skip to content

Conversation

@pengzhiwei2018
Copy link

@pengzhiwei2018 pengzhiwei2018 commented Jan 27, 2021

… with log

Tips

What is the purpose of the pull request

Currently HoodieMergeOnReadRDD use the payload to combine the log data with the base data, However it does not pass the PRE_COMBINE_FILED to the payload class, which will result to incorrect query result.
This PR fix this by passthe PRE_COMBINE_FILED to the DefaultHoodieRecordPayload in HoodieMergeOnReadRDD#mergeRowWithLog.

Brief change log

  • Store the preCombineField to the hoodie.properties. So we can get the preCombineField field when read the table. This add a preCombineField field to the HoodieTableMetaClient#initTableType method.
  • Pass the preCombineField to the DefaultHoodieRecordPayload in HoodieMergeOnReadRDD#mergeRowWithLog

Verify this pull request

This change added tests and can be verified as follows:

  • Added TestMORDataSource#testPreCombineFiledForReadMOR to verify the change.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@garyli1019
Copy link
Member

@pengzhiwei2018 the build failure seems related.

@pengzhiwei2018
Copy link
Author

@pengzhiwei2018 the build failure seems related.

Yes, I have fix it now.

@vinothchandar
Copy link
Member

cc @nsivabalan to also triage

@pengzhiwei2018 pengzhiwei2018 force-pushed the dev_mor_payload branch 2 times, most recently from df3d85a to 66f0739 Compare January 29, 2021 02:32
Copy link
Member

@garyli1019 garyli1019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pengzhiwei2018 thanks for your contribution, left some comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are creating a new Properties in every call, can we put this outside?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I will refactor this code later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a new method initTableType to handle all the null being added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this import should be in the next group

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preCombineFieldFromTableConfig sounds better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field does not exist, will this be an empty string or null?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be null if it does not exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the HoodieTableConfig instead? or somehow translate all precombine field options into one place and deprecate others. Using the write option while reading sounds a bit odd.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the PRECOMBINE_FIELD_OPT_KEY is used to compatible with the old table which has not store the preCombineField to hoodie.properties. Using the write option is a bit odd. So I have add a READ_PRECOMBINE_FIELD to the DataSourceReadOptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this field option instead of using null?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@pengzhiwei2018 pengzhiwei2018 force-pushed the dev_mor_payload branch 4 times, most recently from 324489d to 7b3d36e Compare January 29, 2021 15:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we think of introducing a builder pattern since we have too many overloaded constructors/initTableTypes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garyli1019 @vinothchandar : I am sure this would have been brought up earlier too. Curious as to why we haven't exposed a builder for MetaClient instantiation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov-io
Copy link

codecov-io commented Jan 29, 2021

Codecov Report

Merging #2497 (2f23040) into master (23f2ef3) will increase coverage by 0.24%.
The diff coverage is 58.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2497      +/-   ##
============================================
+ Coverage     50.28%   50.53%   +0.24%     
- Complexity     3120     3123       +3     
============================================
  Files           430      430              
  Lines         19565    19597      +32     
  Branches       2004     2008       +4     
============================================
+ Hits           9838     9903      +65     
+ Misses         8924     8886      -38     
- Partials        803      808       +5     
Flag Coverage Δ Complexity Δ
hudicli 37.21% <ø> (ø) 0.00 <ø> (ø)
hudiclient 100.00% <ø> (ø) 0.00 <ø> (ø)
hudicommon 51.45% <0.00%> (-0.07%) 0.00 <0.00> (ø)
hudiflink 33.03% <ø> (ø) 0.00 <ø> (ø)
hudihadoopmr 33.16% <ø> (ø) 0.00 <ø> (ø)
hudisparkdatasource 69.46% <72.41%> (+3.60%) 0.00 <4.00> (ø)
hudisync 48.61% <ø> (ø) 0.00 <ø> (ø)
huditimelineservice 66.49% <ø> (ø) 0.00 <ø> (ø)
hudiutilities 69.48% <ø> (ø) 0.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...rg/apache/hudi/common/table/HoodieTableConfig.java 45.45% <0.00%> (-0.60%) 17.00 <0.00> (ø)
...pache/hudi/common/table/HoodieTableMetaClient.java 68.33% <0.00%> (-2.36%) 45.00 <0.00> (ø)
...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala 89.78% <60.00%> (-1.70%) 14.00 <2.00> (+2.00) ⬇️
...g/apache/hudi/MergeOnReadIncrementalRelation.scala 81.45% <71.42%> (-0.76%) 22.00 <1.00> (+1.00) ⬇️
.../org/apache/hudi/MergeOnReadSnapshotRelation.scala 89.13% <77.77%> (-1.46%) 17.00 <1.00> (+1.00) ⬇️
...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala 48.76% <100.00%> (+0.18%) 0.00 <0.00> (ø)
...ache/hudi/common/fs/inline/InMemoryFileSystem.java 79.31% <0.00%> (-10.35%) 15.00% <0.00%> (-1.00%)
...src/main/java/org/apache/hudi/QuickstartUtils.java 60.46% <0.00%> (+60.46%) 0.00% <0.00%> (ø%)

Copy link
Member

@garyli1019 garyli1019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pengzhiwei2018 , LGTM! @nsivabalan I agree we should use a builder pattern to init the table. Looks like @lw309637554 is working on the refactoring https://issues.apache.org/jira/browse/HUDI-1315.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

@pengzhiwei2018 pengzhiwei2018 force-pushed the dev_mor_payload branch 2 times, most recently from 4891a86 to 2f23040 Compare January 31, 2021 08:19
… with log

remove unused import

fix test case

fix compile error

fix compile error

refactor the code

format code

format code

add READ_PRE_COMBINE_FIELD config

fix crash when PRECOMBINE_FIELD is not set

add default method for initTableTypeWithBootstrap

fix check style
@nsivabalan
Copy link
Contributor

thanks @pengzhiwei2018 , LGTM! @nsivabalan I agree we should use a builder pattern to init the table. Looks like @lw309637554 is working on the refactoring https://issues.apache.org/jira/browse/HUDI-1315.

thnx Gary. I have asked liwei. If he is not working on it as of now, probably I will take it up and put up a diff.

@garyli1019
Copy link
Member

thnx Gary. I have asked liwei. If he is not working on it as of now, probably I will take it up and put up a diff.

hi @nsivabalan , do you think we can merge this PR and do the refactoring in a separate PR or on hold for now? WDYT?

@nsivabalan
Copy link
Contributor

Yes, sounds good @garyli1019 . LGTM. Feel free to land it if you are good.

@nsivabalan
Copy link
Contributor

@garyli1019 : I am good w/ the PR. Can you land it. Please fill in the right commit message while you squash & merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants