Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented May 11, 2023

Change Logs

Delete operation in custom payload after RFC-46: while looking into a 0.13.1 release blocker, I found that custom payload implementation like AWS DMS payload and Debezium payload are not properly migrated to the new APIs introduced by RFC-46, causing the delete operation to fail. Our tests did not catch this.

It is currently assumed that delete records are marked by "_hoodie_is_deleted"; however, custom CDC payloads use op field to mark deletes.

This PR fixes the issue by adding a new API isDeleteRecord(GenericRecord genericRecord) in BaseAvroPayload to allow the payload to implement custom logic to indicate if a record is a delete record.

Impact

Fixes the failure when the custom payload uses another field to identify deletes.

Risk level

low

Documentation Update

Need to update 0.13.0 release notes to indicate the regression.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@nsivabalan nsivabalan added release-0.13.1 priority:blocker Production down; release blocker labels May 12, 2023
@yihua yihua force-pushed the HUDI-6199-fix-payload-deletes branch from ad450cd to 042b47a Compare May 15, 2023 05:48
@yihua yihua changed the title [WIP][HUDI-6199] Fix deletes with custom payload implementation [HUDI-6199] Fix deletes with custom payload implementation May 15, 2023
@yihua
Copy link
Contributor Author

yihua commented May 15, 2023

CI is green.
Screenshot 2023-05-15 at 09 45 43

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua merged commit 0996310 into apache:master May 15, 2023
yihua added a commit to yihua/hudi that referenced this pull request May 15, 2023
There was a bug that the delete records are assumed to be marked by "_hoodie_is_deleted"; however, custom CDC payloads use "op" field to mark deletes.  In such a case, AWS DMS payload and Debezium payload failed with deletes.  This commit fixes the issue by adding a new API isDeleteRecord(GenericRecord genericRecord) in BaseAvroPayload to allow the payload to implement custom logic to indicate if a record is a delete record.

Co-authored-by: Raymond Xu <[email protected]>
yihua added a commit to yihua/hudi that referenced this pull request May 15, 2023
There was a bug that the delete records are assumed to be marked by "_hoodie_is_deleted"; however, custom CDC payloads use "op" field to mark deletes.  In such a case, AWS DMS payload and Debezium payload failed with deletes.  This commit fixes the issue by adding a new API isDeleteRecord(GenericRecord genericRecord) in BaseAvroPayload to allow the payload to implement custom logic to indicate if a record is a delete record.

Co-authored-by: Raymond Xu <[email protected]>
yihua added a commit to yihua/hudi that referenced this pull request May 17, 2023
There was a bug that the delete records are assumed to be marked by "_hoodie_is_deleted"; however, custom CDC payloads use "op" field to mark deletes.  In such a case, AWS DMS payload and Debezium payload failed with deletes.  This commit fixes the issue by adding a new API isDeleteRecord(GenericRecord genericRecord) in BaseAvroPayload to allow the payload to implement custom logic to indicate if a record is a delete record.

Co-authored-by: Raymond Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants