forked from apache/hudi
-
Notifications
You must be signed in to change notification settings - Fork 1
merge master #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
merge master #4
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…4941) * Fixing populateMeta fields update to metadata table * Fix checkstyle violations Co-authored-by: Sagar Sumit <[email protected]>
… interfaces first (#4942) * In some complex network environment, the current code returns wildcard address 0.0.0.0 which is not desired.
…nc.enable" directly, async clustering not work (#4905) Co-authored-by: Rex An <[email protected]>
…d for Spark SQL (#4901) * [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL * [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL * [HUDI-3445] Clustering Command Based on Call Procedure Command for Spark SQL Co-authored-by: shibei <[email protected]>
…partitions having different schemas (#4468) * Fixing Hive getSchema for RT tables * Addressing feedback * temp diff * fixing tests after spark datasource read support for metadata table is merged to master * Adding multi-partition schema evolution tests to HoodieRealTimeRecordReader Co-authored-by: Aditya Tiwari <[email protected]> Co-authored-by: sivabalan <[email protected]>
…ing _hoodie_is_deleted column to schema (#4921)
…cess before it is transformed to DataSet (#4930)
…ssary hoodie records (#4932) * log scanner optimization * payload equals switches to `=` Co-authored-by: 苏承祥 <[email protected]>
…4811) * Making commit preserve metadata to true * Fixing integ tests * Fixing preserve commit metadata for metadata table * fixed bootstrap tests * temp diff * Fixing merge handle * renaming fallback record * fixing build issue * Fixing test failures
… object into the closure for Spark to serialize (#4954) - Avoid including whole MultipleSparkJobExecutionStrategy object into the closure for Spark to serialize
…y on HDFS (#4739) - This change makes sure MT records are updated appropriately on HDFS: previously after Log File append operations MT records were updated w/ just the size of the deltas being appended to the original files, which have been found to be the cause of issues in case of Rollbacks that were instead updating MT with records bearing the full file-size. - To make sure that we hedge against similar issues going f/w, this PR alleviates this discrepancy and streamlines the flow of MT table always ingesting records bearing full file-sizes.
Co-authored-by: yuezhang <[email protected]>
…lter construction from index based on the type param (#4848) Rework of #4761 This diff introduces following changes: - Write stats are converted to metadata index records during the commit. Making them use the HoodieData type so that the record generation scales up with needs. - Metadata index init support for bloom filter and column stats partitions. - When building the BloomFilter from the index records, using the type param stored in the payload instead of hardcoded type. - Delta writes can change column ranges and the column stats index need to be properly updated with new ranges to be consistent with the table dataset. This fix add column stats index update support for the delta writes. Co-authored-by: Manoj Govindassamy <[email protected]>
…ulti processors at once (#4968)
Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false. While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
… columns (#4818) NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant This PR is making sure that BaseFileOnlyViewRelation only reads projected columns as well as avoiding unnecessary serde from Row to InternalRow Brief change log - Introduced HoodieBaseRDD as a base for all custom RDD impls - Extracted common fields/methods to HoodieBaseRelation - Cleaned up and streamlined HoodieBaseFileViewOnlyRelation - Fixed all of the Relations to avoid superfluous Row <> InternalRow conversions
* [HUDI-2999] rfc for consistent hashing index * [HUDI-2999] review: add metadata table & non-dual-write solution (virtual log file) for resizing Co-authored-by: xiaoyuwei <[email protected]>
Co-authored-by: 苏承祥 <[email protected]>
… in TestSchemaPostProcessor (#5019)
…load previous Index Table state (#5015)
#4999) Co-authored-by: Rex An <[email protected]>
…etting multi processors at once (#4969)
…d compaction if rollback failed mid-way (#4971)
#5013) Create new TypedProperties while performing clustering Add OrderedProperties and minor refactoring Add javadoc and remove getters from OrderedProperties
…4984) Co-authored-by: Y Ethan Guo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.