Skip to content

Conversation

@ad1happy2go
Copy link
Collaborator

@ad1happy2go ad1happy2go commented Apr 25, 2023

Change Logs

Updated isDelete method in HoodieAvroRecord to handle deletes in DMS payload

Github Issue for Reference - #8278
JIRA - https://issues.apache.org/jira/browse/HUDI-6138

Impact

No Major impact. It will avoid the Null Options error for DMS payload when empty record is coming for deletes.

Risk level

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@vinothchandar vinothchandar added priority:critical Production degraded; pipelines stalled release-0.14.0 labels Apr 25, 2023
@ad1happy2go ad1happy2go changed the title Handled empty option for Hoodie Avro Record [HUDI-6138] Handled empty option for Hoodie Avro Record Apr 25, 2023
@yihua yihua linked an issue Apr 25, 2023 that may be closed by this pull request
@yihua yihua self-assigned this Apr 25, 2023
// Prepend meta-fields into the record
MetadataValues metadataValues = populateMetadataFields(finalRecord);
HoodieRecord populatedRecord =
Option<HoodieRecord> populatedRecord =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for records that are deleted, isn't finalRecordOpt.isPresent() will be false and hence the condition in L243 will be false and we will not hit this condition only.

* NOTE: This operation is idempotent
*/
public abstract HoodieRecord prependMetaFields(Schema recordSchema, Schema targetSchema, MetadataValues metadataValues, Properties props);
public abstract Option<HoodieRecord> prependMetaFields(Schema recordSchema, Schema targetSchema, MetadataValues metadataValues, Properties props);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a public interface. lets add a old one back and mark it as deprecated.

GenericRecord newAvroRecord = HoodieAvroUtils.rewriteRecordWithNewSchema(avroRecordOpt.get(), targetSchema);
updateMetadataValuesInternal(newAvroRecord, metadataValues);
return new HoodieAvroRecord<>(getKey(), new RewriteAvroPayload(newAvroRecord), getOperation(), this.currentLocation, this.newLocation);
if (avroRecordOpt.isPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is the main fix in this patch is it ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand which flow/use-case will hit this. From what I gleaned, most of the callers to this method already filter for deleted record and should not be calling this.

@ad1happy2go ad1happy2go force-pushed the HUDI-6138-fix-dms-avro-payload branch from a7dd503 to 121ba39 Compare May 2, 2023 06:47
@hudi-bot
Copy link
Collaborator

hudi-bot commented May 8, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yihua
Copy link
Contributor

yihua commented May 8, 2023

Azure CI has irrelevant failure.

@yihua
Copy link
Contributor

yihua commented May 8, 2023

@nsivabalan do you still have any concern?

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not possible to write tests for this?

Copy link
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yihua
Copy link
Contributor

yihua commented May 9, 2023

is it not possible to write tests for this?

Working on a test now.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not land w/o any tests

@Override
public boolean isDelete(Schema recordSchema, Properties props) throws IOException {
if (this.data instanceof BaseAvroPayload) {
if (this.data instanceof BaseAvroPayload && !(this.data instanceof AWSDmsAvroPayload)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should avoid making base class dependent on subclass. Besides, the issue rooted at custom delete marker, which DefaultHoodieRecordPayload uses and gets affected by this too. So this won't resolve for DefaultHoodieRecordPayload and subclasses

@yihua
Copy link
Contributor

yihua commented May 10, 2023

After some investigation, I found that the custom payload implementation like AWS DMS payload and Debezium payload are not properly migrated to the new APIs introduced by RFC-46, causing the delete operation to fail. Our tests did not catch this. While this fix gets around the issue with a bandaid, we should fix the implementation of the payload. I'm going to put up a patch.

@yihua
Copy link
Contributor

yihua commented May 10, 2023

The root cause is that the code assumes that delete records are marked by _hoodie_is_deleted; however, custom CDC payloads uses op field to mark deletes, so the CDC delete payloads are not properly identified.

@xushiyan
Copy link
Member

close in favor of #8690

@xushiyan xushiyan closed this May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled release-0.14.0

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[SUPPORT] Deltastreamer Fails with AWSDmsAvroPayload

7 participants