merge from master #6

fengjian428 · 2022-04-10T15:08:52Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

…5090)

* Remove glob pattern basePath from the deltastreamer tests. * [HUDI-3689] Fix file scheme config for CI failure in TestHoodieRealTimeRecordReader Co-authored-by: Raymond Xu <[email protected]>

* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`

…able (#5098)

Co-authored-by: Sagar Sumit <[email protected]>

…a' is true (#5088)

…in Data Skipping flow (#4996)

…ECOMBINE_FIELD_TYPE_PROP (#5096)

…only projected columns (#4888)

…exist in metadata table active timeline (#4821)

…n modifying it (#5027)

…ile Size limit (#5129)

…CKET index (#5135)

#5137) * [HUDI-3719] High performance costs of AvroSerizlizer in DataSource writing * add benchmark framework which modify from spark add avroSerDerBenchmark

…5220) * dbt example models to demonstrate hudi dbt integration * Fixed readme text

)

…ed records (#5232)

) - Adding capability to fetch Metadata Records by key prefix so that Data Skipping could fetch only Column Stats - Index records pertaining to the columns being queried by, instead of reading out whole Index. - Fixed usages of HFileScanner in HFileReader. few code paths uses cached scanner if available. Other code paths uses its own HFileScanner w/ positional read. Brief change log - Rebasing ColumnStatsIndexSupport to rely on HoodieBackedTableMetadata in lieu of reading t/h Spark DS - Adding methods enabling key-prefix lookups to HoodiFileReader, HoodieHFileReader - Wiring key-prefix lookup t/h LogRecordScanner impls - Cleaning up HoodieHFileReader impl Co-authored-by: sivabalan <[email protected]> Co-authored-by: Sagar Sumit <[email protected]>

…s not throw any exception (#5205)

#5224) - Fix handling of the isNotNull predicate in Data Skipping

…5236)

…ite by flink cannot be read by spark. (#4421)

…rollback (#5245)

…path depth is less than 3 (#5051)

…ion (#5234)

…nd shading (#5257)

…#5252) * Depend on FSUtils#getRelativePartitionPath(basePath, logFilePath.getParent) to get the partition. * If the list of log file paths in the split is empty, then fallback to usual behaviour.

…#getHostname (#5260)

…e in MT (#5259) * Filter out empty string (for non-partitioned table) being added to "__all_partitions__" record * Instead of filtering, transform empty partition-id to `NON_PARTITIONED_NAME` * Cleaned up `HoodieBackedTableMetadataWriter` * Make sure REPLACE_COMMITS are handled as well

… variable (#5265)

- add missing licenses - fix CI setting to run rat plugin - fix deploy script to include integ test modules

…HoodieBloomIndex (#5268)

codope and others added 30 commits March 23, 2022 12:13

[HUDI-3642] Handle NPE due to empty requested replacecommit metadata (#…

f96ba7a

…5090)

Fixing non partitioned all files record in MDT (#5108)

52f0498

[minor] Checks the data block type for archived timeline (#5106)

a1c42fc

[HUDI-3689] Fix glob path and hive sync in deltastreamer tests (#5117)

fe2c398

* Remove glob pattern basePath from the deltastreamer tests. * [HUDI-3689] Fix file scheme config for CI failure in TestHoodieRealTimeRecordReader Co-authored-by: Raymond Xu <[email protected]>

[HUDI-3684] Fixing NPE in ParquetUtils (#5102)

ccc3728

* Make sure nulls are properly handled in `HoodieColumnRangeMetadata`

[HUDI-3689] Remove Azure CI cache (#5121)

b147065

[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120)

686da41

[HUDI-3706] Downgrade maven surefire and failsafe version (#5123)

44ab3b7

[HUDI-3689] Fix delta streamer tests (#5124)

ff13665

[HUDI-3689] Disable flaky tests in TestHoodieDeltaStreamer (#5127)

4ddd094

[HUDI-3624] Check all instants before starting a commit in metadata t…

9b3dd2e

…able (#5098)

[HUDI-3638] Make ZookeeperBasedLockProvider serializable (#5112)

608d4bf

[HUDI-3701] Flink bulk_insert support bucket hash index (#5118)

5e86cdd

[HUDI-1180] Upgrade HBase to 2.4.9 (#5004)

eaa4c4f

Co-authored-by: Sagar Sumit <[email protected]>

[HUDI-3703] Reset taskID in restoreWriteMetadata (#5122)

483ee84

[HUDI-3580] Claim RFC number 48 for LogCompaction action RFC (#5128)

2fd9a4d

[HUDI-3678] Fix record rewrite of create handle when 'preserveMetadat…

8896864

…a' is true (#5088)

[HUDI-3594] Supporting Composite Expressions over Data Table Columns …

8b38dde

…in Data Skipping flow (#4996)

[HUDI-3711] Fix typo in MaxwellJsonKafkaSourcePostProcessor.Config#PR…

f20c986

…ECOMBINE_FIELD_TYPE_PROP (#5096)

[HUDI-3563] Make quickstart examples covered by CI tests (#5082)

e5c3f90

[MINOR] fix QuickstartUtils move (#5133)

12cc8e7

[HUDI-3396] Refactoring MergeOnReadRDD to avoid duplication, fetch …

51034fe

…only projected columns (#4888)

[HUDI-3435] Do not throw exception when instant to rollback does not …

0c09a97

…exist in metadata table active timeline (#4821)

[HUDI-3612] Clustering strategy should create new TypedProperties whe…

57b4f39

…n modifying it (#5027)

[HUDI-3709] Fixing ParquetWriter impls not respecting Parquet Max F…

189d529

…ile Size limit (#5129)

[HUDI-3716] OOM occurred when use bulk_insert cow table with flink BU…

4d940bb

…CKET index (#5135)

[HUDI-3604] Adjust the order of timeline changes in rollbacks (#5114)

484b340

[MINOR] Relaxing cleaner and archival configs (#5142)

85c4a6c

[HUDI-3719] High performance costs of AvroSerizlizer in DataSource wr… (

9da2dd4

#5137) * [HUDI-3719] High performance costs of AvroSerizlizer in DataSource writing * add benchmark framework which modify from spark add avroSerDerBenchmark

[HUDI-3724] Fixing closure of ParquetReader (#5141)

f2a93ea

Vinoth Govindarajan and others added 29 commits April 5, 2022 08:58

[HUDI-2319] dbt example models to demonstrate hudi dbt integration (#…

92ca426

…5220) * dbt example models to demonstrate hudi dbt integration * Fixed readme text

[HUDI-3782] Fixing table config when any of the index is disabled (#5222

898be61

)

[HUDI-3723] Fixed stack overflows in Record Iterators (#5235)

8baeb81

Moving to 0.12.0-SNAPSHOT on master branch.

e96f08f

[HUDI-3800] Fixed preserve commit metadata for compaction for untouch…

8683fb1

…ed records (#5232)

[MINOR] Fixing build failure when using flink-1.13 (#5214)

7612549

[HUDI-3340] Fix deploy_staging_jars for different profiles (#5240)

ca27327

[HUDI-3726] Switching from non-partitioned to partitioned key gen doe…

939b3d1

…s not throw any exception (#5205)

[HUDI-3340] Fix deploy_staging_jars command (#5243)

b2f09a1

[HUDI-3739] Fix handling of the isNotNull predicate in Data Skipping (

d43b4cd

#5224) - Fix handling of the isNotNull predicate in Data Skipping

[HUDI-3808] Flink bulk_insert timestamp(3) can not be read by Spark (#…

e33149b

…5236)

[HUDI-3096] fixed the bug that the cow table(contains decimalType) wr…

531381f

…ite by flink cannot be read by spark. (#4421)

[HUDI-3805] Delete existing corrupted requested rollback plan during …

9d744bb

…rollback (#5245)

[HUDI-3643] Fix hive count exception when the table is empty and the …

6a83964

…path depth is less than 3 (#5051)

[HUDI-3571] Spark datasource continuous ingestion tool (#5156)

b3c834a

[HUDI-3637] Exclude uncommitted log files from metadata table validat…

cd2c346

…ion (#5234)

[HUDI-3810] Fixing lazy read for metadata log record readers (#5241)

ef06e4a

[HUDI-3823] Fix hudi-hive-sync-bundle to include HBase dependencies a…

672974c

…nd shading (#5257)

[HUDI-3454] Fix partition name in all code paths for LogRecordScanner (…

df87095

…#5252) * Depend on FSUtils#getRelativePartitionPath(basePath, logFilePath.getParent) to get the partition. * If the list of log file paths in the split is empty, then fallback to usual behaviour.

[HUDI-3781] fix spark delete sql can not delete record (#5215)

7a6272f

[HUDI-3827] Promote the inetAddress picking strategy for NetworkUtils…

67215ab

…#getHostname (#5260)

[HUDI-3571] Spark datasource continuous checkpoint should have own fs…

26eb7b8

… variable (#5265)

[MINOR] Update README of docker build setup (#5256)

1cc7542

[HUDI-3825] Fixing Column Stats Index updating sequence (#5267)

81b25c5

[HUDI-3837] Fix license and rat check settings (#5273)

5e65aef

- add missing licenses - fix CI setting to run rat plugin - fix deploy script to include integ test modules

[HUDI-3807] Add a new config to control the use of metadata index in …

3e97c88

…HoodieBloomIndex (#5268)

[MINOR] Fix typos in the comments of HoodieMergeHandle (#5271)

15c2645

fengjian428 merged commit 179dc04 into fengjian428:master Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge from master #6

merge from master #6

Uh oh!

fengjian428 commented Apr 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge from master #6

merge from master #6

Uh oh!

Conversation

fengjian428 commented Apr 10, 2022

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants