merge master by fengjian428 · Pull Request #7 · fengjian428/hudi

fengjian428 · 2022-05-05T14:55:04Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

) Fixing performance hits in reading Column Stats Index: [HUDI-3834] There's substantial performance degradation in Avro 1.10 default generated Builder classes: they by default rely on SpecificData.getForSchema that load corresponding model's class using reflection, which takes a hit when executed on the hot-path (this was bringing overall runtime to read full Column Stats Index of 800k records to 60s, whereas now it's taking mere 3s) Addressing memory churn by over-used Hadoop's Path creation: Path ctor is not a lightweight sequence and produces quite a bit of memory churn adding pressure on GC. Cleaning such avoidable allocations up to make sure there's no unnecessarily added pressure on GC.

…ble configs (#5244) Addressing the problem of Data Skipping not respecting Metadata Table configs which might differ b/w write/read paths. More details could be found in HUDI-3812. - Fixing Data Skipping configuration to respect MT configs (on the Read path) - Tightening up DS handling of cases when no top-level columns are in the target query - Enhancing tests to cover all possible case

- Adding non-partitioned support to integ tests - Fixing some of the test yamls and properties

#5284)

…oving some extraneous methods in trxn manager (#5255)

Co-authored-by: lvshuang.xjs <lvshuang.xjs@alibaba-inc.com>

…5275) Currently, Data Skipping is not handling correctly the case when column-stats are not aligned and, for ex, some of the (column, file) combinations are missing from the CSI. This could occur in different scenarios (schema evolution, CSI config changes), and has to be handled properly when we're composing CSI projection for Data Skipping. This PR addresses that. - Added appropriate aligning for the transposed CSI projection

…#5274) * Fixing incorrect selection of MT partitions to be updated * Ensure that metadata partitions table config is inherited correctly Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

…mer code path (#5294) * [HUDI-3838] Implemented drop partition column feature for delta streamer code path * Ensure drop partition table config is updated in hoodie.props Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

…n yaml (#5300)

…ted in `HoodieMergeHandle` (#5296) Fixing FILENAME_METADATA_FIELD not being correctly updated in HoodieMergeHandle, in cases when old-record is carried over from existing file as is. - Revisited HoodieFileWriter API to accept HoodieKey instead of HoodieRecord - Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply old record is carried over - Exposing standard JVM's debugger ports in Docker setup

…5309)

…nfig (#5307)

…5272) Make truncate partition and drop partition behave as drop partition with purge, which delete all records via Hudi DELETE_PARTITION; partition removed from metastore

…5282)

…otprint (#5060) Co-authored-by: zhouhuidong <zhouhuidong@bilibili.co>

…emory footprint (#5060)" (#5323) This reverts commit f0ab4a6.

… w/ Spark 3.2.0 (#5378) - Due to the fact that Spark 3.2.1 is non-BWC w/ 3.2.0, we have to handle all these incompatibilities in Spark32HoodieParquetFileFormat. This PR is addressing that. Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>

…5336)

…ned field (#5373)

Co-authored-by: hehuiyuan1 <hehuiyuan@jd.com>

…link-hudi (#5405)" (#5421) This reverts commit bda3db0.

…eld with writes (#5424) Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.

…5381)

…dieNotSupportedException (#5432)

…ormance (#5441)

…k should exit. (#5391) Co-authored-by: y00617041 <yangxuan42@huawei.com>

… default value error (#5368) Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>

…ucket hash Index (#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView

Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers

Co-authored-by: xicm <xicm@asiainfo.com>

…n flink (#5434) * Fix partition path fields as hive sync partition fields error

* Add RFC doc Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * Add note regarding catalog naming Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

…ommit (#5487)

Alexey Kudinkin and others added 30 commits April 10, 2022 13:42

[HUDI-3842] Integ tests for non partitioned datasets (#5276)

12731f5

- Adding non-partitioned support to integ tests - Fixing some of the test yamls and properties

[HUDI-3847] Fix NPE due to null schema in HoodieMetadataTableValidator (

63a099c

#5284)

[HUDI-3798] Fixing ending of a transaction by different owner and rem…

2245a95

…oving some extraneous methods in trxn manager (#5255)

[HUDI-3817] shade parquet dependency for hudi-hadoop-mr-bundle (#5250)

5c41e30

Co-authored-by: lvshuang.xjs <lvshuang.xjs@alibaba-inc.com>

[MINOR] fixing timeline server for integ tests (#5289)

52ea1e4

[HUDI-3844] Update props in indexer based on table config (#5293)

3d8fc78

[HUDI-3799] Fixing not deleting empty instants w/o archiving (#5261)

f91e9e6

[HUDI-3839] Fixing incorrect selection of MT partitions to be updated (…

101b82a

…#5274) * Fixing incorrect selection of MT partitions to be updated * Ensure that metadata partitions table config is inherited correctly Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

[HUDI-3843] Make flink profiles build with scala-2.11 (#5279)

84783b9

[MINOR] Integ Test Reducing partitions for log running multi partitio…

25dce94

…n yaml (#5300)

[HUDI-3838] Moved the getPartitionColumns logic to driver. (#5303)

2d46d52

[HUDI-3859] Fix spark profiles and utilities-slim dep (#5297)

2e6e302

[HUDI-3867] Disable Data Skipping by default (#5306)

434e782

[HUDI-3868] Disable the sort input for flink streaming append mode (#…

43de2b4

…5309)

[MINOR] Inline the partition path logic into the builder (#5310)

0281725

[HUDI-3870] Add timeout rollback for flink online compaction (#5314)

6f9b02d

[HUDI-3869] Improve error handling of loading Hudi conf (#5311)

c7f41f9

[HUDI-3686] Fix inline and async table service check in HoodieWriteCo…

bab6916

…nfig (#5307)

[MINOR] Code cleanup in test utils (#5312)

571cbe4

[HUDI-3876] Fixing fetching partitions in GlueSyncClient (#5318)

a081c2b

[HUDI-3826] Make truncate partition use delete_partition operation (#…

44b3630

…5272) Make truncate partition and drop partition behave as drop partition with purge, which delete all records via Hudi DELETE_PARTITION; partition removed from metastore

[HUDI-3845] Fix delete mor table's partition with urlencode's error (#…

6621f3c

…5282)

[HUDI-3652] Make ObjectSizeCalculator threadlocal to reduce memory fo…

f0ab4a6

…otprint (#5060) Co-authored-by: zhouhuidong <zhouhuidong@bilibili.co>

Revert "[HUDI-3652] Make ObjectSizeCalculator threadlocal to reduce m…

d6a64f7

…emory footprint (#5060)" (#5323) This reverts commit f0ab4a6.

[HOTFIX] add missing license (#5322) (#5324)

9e8664f

Alexey Kudinkin and others added 29 commits April 21, 2022 21:00

[DOCS] Add commit activity, twitter badgers, and Hudi logo in README (#…

20781a5

…5336)

[HUDI-3947] Fixing Hive conf usage in HoodieSparkSqlWriter (#5401)

7523542

[HUDI-3950] add parquet-avro to gcp-bundle (#5399)

505ee67

[HUDI-3948] Fix presto bundle missing HBase classes (#5398)

8633bd6

[HUDI-3923] Fix cast exception while reading boolean type of partitio…

5e5c177

…ned field (#5373)

support generan parameter 'sink.parallelism' for flink-hudi (#5405)

bda3db0

Co-authored-by: hehuiyuan1 <hehuiyuan@jd.com>

[HUDI-3946] Validate option path in flink hudi sink (#5397)

d994c58

Revert "[HUDI-3951]support generan parameter 'sink.parallelism' for f…

9054b85

…link-hudi (#5405)" (#5421) This reverts commit bda3db0.

[HUDI-3085] Improve bulk insert partitioner abstraction (#4441)

f2ba0fe

[HUDI-3972] Fixing hoodie.properties/tableConfig for no preCombine fi…

762623a

…eld with writes (#5424) Fixed instantiation of new table to set the null for preCombine if not explicitly set by the user.

[HUDI-3478] Claim RFC 51 For CDC (#5437)

77e3332

[MINOR] Update alter rename command class type for pattern matching (#…

6ec039b

…5381)

[HUDI-3977] Flink hudi table with date type partition path throws Hoo…

e1ccf2e

…dieNotSupportedException (#5432)

Claim RFC 52 for Introduce Secondary Index to Improve HUDI Query Perf…

924e2e9

…ormance (#5441)

[HUDI-3945] After the async compaction operation is complete, the tas…

cacbd98

…k should exit. (#5391) Co-authored-by: y00617041 <yangxuan42@huawei.com>

[HUDI-3815] Fix docs description of metadata.compaction.delta_commits…

52953c8

… default value error (#5368) Co-authored-by: pusheng.li01 <pusheng.li01@liulishuo.com>

[HUDI-3943] Some description fixes for 0.10.1 docs (#5447)

4e928a6

[MINOR] support different cleaning policy for flink (#5459)

b27e8b5

[HUDI-3758] Fix duplicate fileId error in MOR table type with flink b…

e421d53

…ucket hash Index (#5185) * fix duplicate fileId with bucket Index * replace to load FileGroup from FileSystemView

[MINOR] Fix CI by ignoring SparkContext error (#5468)

a1d82b4

Sets spark.driver.allowMultipleContexts = true when constructing Spark conf in UtilHelpers

[HUDI-3862] Fix default configurations of HoodieHBaseIndexConfig (#5308)

f492c52

Co-authored-by: xicm <xicm@asiainfo.com>

[HUDI-3978] Fix use of partition path field as hive partition field i…

33ff475

…n flink (#5434) * Fix partition path fields as hive sync partition fields error

[MINOR] Update DOAP for release 0.11.0 (#5467)

6af1ff7

[HUDI-3211][RFC-44] Add RFC for Hudi Connector for Presto (#4563)

9732ba1

* Add RFC doc Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> * Add note regarding catalog naming Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

[MINOR] Update RFC status (#5486)

3343cbb

[HUDI-4005] Update release scripts to help validation (#5479)

8c9209d

[HUDI-4031] Avoid clustering update handling when no pending replacec…

1562bb6

…ommit (#5487)

[HUDI-3667] Run unit tests of hudi-integ-tests in CI (#5078)

f66e83d

fengjian428 merged commit 8a533e8 into fengjian428:master May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

merge master#7

merge master#7
fengjian428 merged 84 commits intofengjian428:masterfrom
apache:master

fengjian428 commented May 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

fengjian428 commented May 5, 2022

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants