-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6138] Handled empty option for Hoodie Avro Record #8573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6138] Handled empty option for Hoodie Avro Record #8573
Conversation
| // Prepend meta-fields into the record | ||
| MetadataValues metadataValues = populateMetadataFields(finalRecord); | ||
| HoodieRecord populatedRecord = | ||
| Option<HoodieRecord> populatedRecord = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for records that are deleted, isn't finalRecordOpt.isPresent() will be false and hence the condition in L243 will be false and we will not hit this condition only.
| * NOTE: This operation is idempotent | ||
| */ | ||
| public abstract HoodieRecord prependMetaFields(Schema recordSchema, Schema targetSchema, MetadataValues metadataValues, Properties props); | ||
| public abstract Option<HoodieRecord> prependMetaFields(Schema recordSchema, Schema targetSchema, MetadataValues metadataValues, Properties props); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a public interface. lets add a old one back and mark it as deprecated.
| GenericRecord newAvroRecord = HoodieAvroUtils.rewriteRecordWithNewSchema(avroRecordOpt.get(), targetSchema); | ||
| updateMetadataValuesInternal(newAvroRecord, metadataValues); | ||
| return new HoodieAvroRecord<>(getKey(), new RewriteAvroPayload(newAvroRecord), getOperation(), this.currentLocation, this.newLocation); | ||
| if (avroRecordOpt.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this is the main fix in this patch is it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me understand which flow/use-case will hit this. From what I gleaned, most of the callers to this method already filter for deleted record and should not be calling this.
a7dd503 to
121ba39
Compare
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java
Outdated
Show resolved
Hide resolved
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Azure CI has irrelevant failure. |
|
@nsivabalan do you still have any concern? |
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it not possible to write tests for this?
CTTY
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Working on a test now. |
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets not land w/o any tests
| @Override | ||
| public boolean isDelete(Schema recordSchema, Properties props) throws IOException { | ||
| if (this.data instanceof BaseAvroPayload) { | ||
| if (this.data instanceof BaseAvroPayload && !(this.data instanceof AWSDmsAvroPayload)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should avoid making base class dependent on subclass. Besides, the issue rooted at custom delete marker, which DefaultHoodieRecordPayload uses and gets affected by this too. So this won't resolve for DefaultHoodieRecordPayload and subclasses
|
After some investigation, I found that the custom payload implementation like AWS DMS payload and Debezium payload are not properly migrated to the new APIs introduced by RFC-46, causing the delete operation to fail. Our tests did not catch this. While this fix gets around the issue with a bandaid, we should fix the implementation of the payload. I'm going to put up a patch. |
|
The root cause is that the code assumes that delete records are marked by |
|
close in favor of #8690 |
Change Logs
Updated isDelete method in HoodieAvroRecord to handle deletes in DMS payload
Github Issue for Reference - #8278
JIRA - https://issues.apache.org/jira/browse/HUDI-6138
Impact
No Major impact. It will avoid the Null Options error for DMS payload when empty record is coming for deletes.
Risk level
low
Documentation Update
N/A
Contributor's checklist