Skip to content

Conversation

@lw309637554
Copy link
Contributor

Tips

What is the purpose of the pull request

spark incremental read support with replace

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@lw309637554
Copy link
Contributor Author

lw309637554 commented Oct 22, 2020

before merge this, need first merge #2196

@lw309637554 lw309637554 changed the title spark incremental read support with replace [HUDI-1264] spark incremental read support with replace Oct 22, 2020
@lw309637554 lw309637554 force-pushed the HUDI-1264-now branch 2 times, most recently from 5d25135 to b0b0ac3 Compare October 23, 2020 10:04
@lw309637554
Copy link
Contributor Author

@satishkotha @n3nash @bvaradar
hi , the solution in this pull request just filter the commits between the latest replace commit and the end commit.
But compare to HoodieParquetRealtimeInputFormat , it use fsView.getLatestMergedFileSlicesBeforeOrOn to filter the not replace slice, if we should change spark incremental relation to use fsView.getLatestMergedFileSlicesBeforeOrOn ?

@vinothchandar
Copy link
Member

@satishkotha @n3nash you guys can take this as well?

@n3nash
Copy link
Contributor

n3nash commented Oct 28, 2020

@satishkotha can you please review this, once done, let me know and I'll take a final pass and merge it

@satishkotha
Copy link
Member

Will review this after concerns in #2196 are addressed

@lw309637554 lw309637554 changed the title [HUDI-1264] spark incremental read support with replace [HUDI-1264][WIP] spark incremental read support with replace Dec 9, 2020
@codecov-io
Copy link

Codecov Report

Merging #2199 (c39aaac) into master (de2fbea) will decrease coverage by 43.13%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #2199       +/-   ##
=============================================
- Coverage     53.49%   10.35%   -43.14%     
+ Complexity     2788       48     -2740     
=============================================
  Files           355       51      -304     
  Lines         16169     1786    -14383     
  Branches       1650      213     -1437     
=============================================
- Hits           8649      185     -8464     
+ Misses         6819     1588     -5231     
+ Partials        701       13      -688     
Flag Coverage Δ Complexity Δ
hudicli ? ?
hudiclient ? ?
hudicommon ? ?
hudihadoopmr ? ?
hudispark ? ?
huditimelineservice ? ?
hudiutilities 10.35% <ø> (-59.75%) 0.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...va/org/apache/hudi/utilities/IdentitySplitter.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-2.00%)
...va/org/apache/hudi/utilities/schema/SchemaSet.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-3.00%)
...a/org/apache/hudi/utilities/sources/RowSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
.../org/apache/hudi/utilities/sources/AvroSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-1.00%)
.../org/apache/hudi/utilities/sources/JsonSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-1.00%)
...rg/apache/hudi/utilities/sources/CsvDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-10.00%)
...g/apache/hudi/utilities/sources/JsonDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
...apache/hudi/utilities/sources/JsonKafkaSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-6.00%)
...pache/hudi/utilities/sources/ParquetDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-5.00%)
...lities/schema/SchemaProviderWithPostProcessor.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
... and 331 more

@n3nash
Copy link
Contributor

n3nash commented Mar 30, 2021

@satishkotha Is this PR still valid ? @lw309637554 Can you please rebase this PR so we can get this landed.

@lw309637554
Copy link
Contributor Author

@satishkotha Is this PR still valid ? @lw309637554 Can you please rebase this PR so we can get this landed.
@n3nash @satishkotha
i think the solution in this pr is not very good.

hi , the solution in this pull request just filter the commits between the latest replace commit and the end commit.
But compare to HoodieParquetRealtimeInputFormat , it use fsView.getLatestMergedFileSlicesBeforeOrOn to filter the not replace slice, if we should change spark incremental relation to use fsView.getLatestMergedFileSlicesBeforeOrOn ?

@codope
Copy link
Member

codope commented Jul 5, 2021

@lw309637554 Is this PR still valid given that #3139 is merged now?

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jul 5, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@lw309637554
Copy link
Contributor Author

@lw309637554 Is this PR still valid given that #3139 is merged now?

@codope hello , i think i can close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants