Skip to content

Conversation

@fengjian428
Copy link
Owner

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

manojpec and others added 30 commits March 3, 2022 15:56
…4941)

* Fixing populateMeta fields update to metadata table

* Fix checkstyle violations

Co-authored-by: Sagar Sumit <[email protected]>
… interfaces first (#4942)

* In some complex network environment, the current code returns wildcard address 0.0.0.0 which is not desired.
…nc.enable" directly, async clustering not work (#4905)


Co-authored-by: Rex An <[email protected]>
…d for Spark SQL (#4901)

* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL

* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL

* [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL

Co-authored-by: shibei <[email protected]>
* Fixing timeline server for repeated refreshes
…partitions having different schemas (#4468)

* Fixing Hive getSchema for RT tables

* Addressing feedback

* temp diff

* fixing tests after spark datasource read support for metadata table is merged to master

* Adding multi-partition schema evolution tests to HoodieRealTimeRecordReader

Co-authored-by: Aditya Tiwari <[email protected]>
Co-authored-by: sivabalan <[email protected]>
…ssary hoodie records (#4932)

* log scanner optimization
* payload equals switches to `=`

Co-authored-by: 苏承祥 <[email protected]>
…4811)

* Making commit preserve metadata to true

* Fixing integ tests

* Fixing preserve commit metadata for metadata table

* fixed bootstrap tests

* temp diff

* Fixing merge handle

* renaming fallback record

* fixing build issue

* Fixing test failures
… object into the closure for Spark to serialize (#4954)

- Avoid including whole MultipleSparkJobExecutionStrategy object into the closure for Spark to serialize
…y on HDFS (#4739)

- This change makes sure MT records are updated appropriately on HDFS: previously after Log File append operations MT records were updated w/ just the size of the deltas being appended to the original files, which have been found to be the cause of issues in case of Rollbacks that were instead updating MT with records bearing the full file-size.

- To make sure that we hedge against similar issues going f/w, this PR alleviates this discrepancy and streamlines the flow of MT table always ingesting records bearing full file-sizes.
For flink insert overwrite operation, do the cleaning each time before the write.
…lter construction from index based on the type param (#4848)

Rework of #4761 
This diff introduces following changes:

- Write stats are converted to metadata index records during the commit. Making them use the HoodieData type so that the record generation scales up with needs. 
- Metadata index init support for bloom filter and column stats partitions.
- When building the BloomFilter from the index records, using the type param stored in the payload instead of hardcoded type.
- Delta writes can change column ranges and the column stats index need to be properly updated with new ranges to be consistent with the table dataset. This fix add column stats index update support for the delta writes.

Co-authored-by: Manoj Govindassamy <[email protected]>
Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
Alexey Kudinkin and others added 17 commits March 9, 2022 21:45
… columns (#4818)

NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant

This PR is making sure that BaseFileOnlyViewRelation only reads projected columns as well as avoiding unnecessary serde from Row to InternalRow

Brief change log
- Introduced HoodieBaseRDD as a base for all custom RDD impls
- Extracted common fields/methods to HoodieBaseRelation
- Cleaned up and streamlined HoodieBaseFileViewOnlyRelation
- Fixed all of the Relations to avoid superfluous Row <> InternalRow conversions
…olumns from schema (#4972)

* [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema

* Fix case sensitivity
* [HUDI-2999] rfc for consistent hashing index

* [HUDI-2999] review: add metadata table & non-dual-write solution (virtual log file) for resizing

Co-authored-by: xiaoyuwei <[email protected]>
#5013)

Create new TypedProperties while performing clustering

Add OrderedProperties and minor refactoring

Add javadoc and remove getters from OrderedProperties
@fengjian428 fengjian428 merged commit 06ea24c into fengjian428:master Mar 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.