-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[MINOR] Added docs for gotchas when using PartialUpdateAvroPayload #8579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR] Added docs for gotchas when using PartialUpdateAvroPayload #8579
Conversation
| class TestPartialUpdateAvroPayload extends HoodieSparkSqlTestBase { | ||
|
|
||
| val partialUpdatePayloadClass = "org.apache.hudi.common.model.PartialUpdateAvroPayload" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add UT instead of ITs in existing class TestPartialUpdateAvroPayload ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, can you please take a look at this PR again? Thank you!
What is exactly lost here? |
PreCombine + combineAndGetUpdateValueThis is so as the incoming batch at (2) will be combineAndGetUpdateValue ONLYResults will be different if SummaryIf |
This is expected right ? We always ignore the nulls while merging, shouldn't the |
NOTE: Let me provide an example again: preCombine + combineAndGetUpdateValuecombineAndGetUpdateValue onlyTLDR: A record's value that should be ignored is in fact, not being ignored. |
|
Okay, got the idea, you are addressing that the always merging behavior can be variable with different sequence of inputs, that's true, and it is not a bug I think. |
|
@danny0405 Yeap, this is not a bug, but will affect the query snapshot results. Also MOR tables are more susceptible of encountering such issues (when performing a query on the _rt table/compaction). The goal of this MR is not to fix any bugs, rather, it's to document and make known such behaviours so that users understand that this is by design. Or rather, a limitation of how Hudi performs merging. |
d817942 to
5ae7a15
Compare
5ae7a15 to
97fd8fb
Compare
…load (apache#8579)" This reverts commit e331141.
Change Logs
Updated java docs and added tests to demonstrate gotchas when using PartialUpdateAvroPayload.
Brief description
When
preCombineis called for records of of keys to be updated and the incoming data is a combination of outdated + new states, the end state of the record will be somewhat counter-intuitive due to Hudi's exsiting design.Example:
This is so as the incoming batch at (2) will be
preCombinebefore performing acombineAndUpdateValue.Results will be different if
combineAndUpdateValueis invoked in order without invokingpreCombine.Example:
Impact
None, changes made are docs and tests to clarify gotchas on how PartialUpdateAvroPayload handles records that are arriving late (unordered data)
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist