Merged
Conversation
… NullPointerException (#5755) SeekTo top cells avoid NullPointerException
…rsion may cause the fileID of the task to not be loaded correctly (#5763) Co-authored-by: john.wick <john.wick@vipshop.com>
…mitMetadata` parsing (#5733) As has been outlined in HUDI-4176, we've hit a roadblock while testing Hudi on a large dataset (~1Tb) having pretty fat commits where Hudi's commit metadata could reach into 100s of Mbs. Given the size some of ours commit metadata instances Spark's parsing and resolving phase (when spark.sql(...) is involved, but before returned Dataset is dereferenced) starts to dominate some of our queries' execution time. - Rebased onto new APIs to avoid excessive Hadoop's Path allocations - Eliminated hasOperationField completely to avoid repeatitive computations - Cleaning up duplication in HoodieActiveTimeline - Added caching for common instances of HoodieCommitMetadata - Made tableStructSchema lazy;
… bulk insert row writer with SimpleKeyGen and virtual keys (#5664) Bulk insert row writer code path had a gap wrt hive style partitioning and default partition when virtual keys are enabled with SimpleKeyGen. This patch fixes the issue.
) - When async indexer is invoked only with "FILES" partition, it fails. Fixing it to work with Async indexer. Also, if metadata table itself is not initialized, and if someone is looking to build indexes via AsyncIndexer, first they are expected to index "FILES" partition followed by other partitions. In general, we have a limitation of building only one index at a time w/ AsyncIndexer and hence. Have added guards to ensure these conditions are met.
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
- Key fetched from metadata table especially from base file reader is not sorted. and hence may result in throwing NPE (key prefix search) or unnecessary seeks to starting of Hfile (full key look ups). Fixing the same in this patch. This is not an issue with log blocks, since sorting is taking care within HoodieHfileDataBlock. - Commit where the sorting was mistakenly reverted [HUDI-3760] Adding capability to fetch Metadata Records by prefix #5208
…Integration (#5737) There are multiple issues with our current DataSource V2 integrations: b/c we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This commit fixes the issue by reverting DSv2 APIs and making Spark use V1, except for schema evaluation logic.
* [MINOR] FlinkStateBackendConverter add more exception message
…es easily (#5744) Co-authored-by: yanenze <yanenze@keytop.com.cn>
…rite (#5619) Co-authored-by: xicm <xicm@asiainfo.com>
…ading metadata table (#5840) When explicitly specifying the metadata table path for reading in spark, the "hoodie.metadata.enable" is overwritten to true for proper read behavior.
- Upgrade junit to 5.7.2 - Downgrade surefire and failsafe to 2.22.2 - Fix test failures that were previously not reported - Improve azure pipeline configs Co-authored-by: liujinhui1994 <965147871@qq.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Co-authored-by: superche <superche@tencent.com>
#5790) TestReaderFilterRowKeys needs to get the key from RECORD_KEY_METADATA_FIELD, but the writer in current UT does not populate the meta field and the schema does not contains meta fields. This fix writes data with schema which contains meta fields and calls writeAvroWithMetadata for writing. Co-authored-by: xicm <xicm@asiainfo.com>
add new config key hoodie.deltastreamer.source.kafka.enable.failOnDataLoss when failOnDataLoss=false (current behaviour, the default), log a warning instead of seeking to earliest silently when failOnDataLoss is set, fail explicitly
#5788) Adding more logs to assist in debugging with HoodieFlinkWriteClient.getOrCreateWriteHandle throwing exception
…s default (#5174) Co-authored-by: yuezhang <yuezhang@freewheel.tv>
…tream if using HDFS (#5048) Add the differentiated logic of creating immutable file in HDFS by first creating the file.tmp and then renaming the file
… api (#5445) Co-authored-by: jerryyue <jerryyue@didiglobal.com>
… partitions through a standalone job. (#4459) Co-authored-by: yuezhang <yuezhang@freewheel.tv>
#5286) Co-authored-by: xicm <xicm@asiainfo.com>
…bine cause exception (#5828)
…ion plan at once (#5677) * [HUDI-4152] Flink offline compaction allow compact multi compaction plan at once * [HUDI-4152] Fix exception for duplicated uid when multi compaction plan are compacted * [HUDI-4152] Provider UT & IT for compact multi compaction plan * [HUDI-4152] Put multi compaction plans into one compaction plan source * [HUDI-4152] InstantCompactionPlanSelectStrategy allow multi instant by using comma * [HUDI-4152] Add IT for InstantCompactionPlanSelectStrategy
…on. (#5995) * fix for updateTableParameters which is not excluding partition columns and updateTableProperties boolean check * Fix - serde parameters getting overrided on table property update * removing stale syncConfig
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.