Skip to content

Conversation

@fanaticjo
Copy link

@fanaticjo fanaticjo commented Jun 5, 2021

Tips

What is the purpose of the pull request

If anyone wants to use custom upsert logic then they have to override the Latest avro payload class which is only possible in java or scala .

Python developers have no such option .

Will be introducing a new payload class and a new key which can work in java , scala and python

This class will be responsible for custom upsert logic and a new key hoodie.update.key which will accept the columns which only need to be updated

"hoodie.update.keys": "admission_date,name", #comma seperated key
"hoodie.datasource.write.payload.class": "com.hudiUpsert.hudiCustomUpsert" #custom upsert key

so this will only update the column admission_date and name in the target table

Brief change log

(for example:)

  • added hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

@wangxianghu wangxianghu changed the title HUDI-1936 Introduce a optional property for conditional upsert [HUDI-1936] Introduce a optional property for conditional upsert Jun 5, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 5, 2021

Codecov Report

Attention: Patch coverage is 8.57143% with 32 lines in your changes missing coverage. Please review.

Project coverage is 55.04%. Comparing base (974b476) to head (26dadb6).
Report is 4289 commits behind head on master.

Files with missing lines Patch % Lines
...i/common/model/OverwriteWithCustomAvroPayload.java 10.34% 24 Missing and 2 partials ⚠️
...apache/hudi/exception/ColumnNotFoundException.java 0.00% 2 Missing ⚠️
...che/hudi/exception/UpdateKeyNotFoundException.java 0.00% 2 Missing ⚠️
...apache/hudi/exception/WriteOperationException.java 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3035      +/-   ##
============================================
+ Coverage     55.01%   55.04%   +0.02%     
- Complexity     3850     3865      +15     
============================================
  Files           485      491       +6     
  Lines         23467    23640     +173     
  Branches       2497     2535      +38     
============================================
+ Hits          12911    13012     +101     
- Misses         9405     9466      +61     
- Partials       1151     1162      +11     
Flag Coverage Δ
hudicli 39.55% <ø> (ø)
hudiclient ∅ <ø> (∅)
hudicommon 50.14% <8.57%> (-0.17%) ⬇️
hudiflink 63.25% <ø> (-0.38%) ⬇️
hudihadoopmr 51.43% <ø> (-0.11%) ⬇️
hudisparkdatasource 74.28% <ø> (+0.95%) ⬆️
hudisync 46.60% <ø> (+0.15%) ⬆️
huditimelineservice 64.36% <ø> (ø)
hudiutilities 70.83% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...apache/hudi/exception/ColumnNotFoundException.java 0.00% <0.00%> (ø)
...che/hudi/exception/UpdateKeyNotFoundException.java 0.00% <0.00%> (ø)
...apache/hudi/exception/WriteOperationException.java 0.00% <0.00%> (ø)
...i/common/model/OverwriteWithCustomAvroPayload.java 10.34% <10.34%> (ø)

... and 24 files with indirect coverage changes

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. This will be beneficial for the community.

Couple of high level comments.

  • Did you think through if this works for both COW and MOR ? If not, let's try to think how we can make it work for MOR.
  • Can user set different values for "hoodie.update.keys" for different batch of writes. From my understanding, COW should be fine. but not sure about MOR.

if (!recordOption.isPresent()) {
return Option.empty();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might have to check for delete record as well.
something like

isDeleteRecord((GenericRecord) indexedRecord)

Please do check OverwriteWithLatestAvroPayload impl

Copy link
Author

@fanaticjo fanaticjo Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes user can set different values for different batches for cow it working , mor will test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should only be considered only for upserts , so why delete is required ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see one thing as a blocker is if a new column is introduced it makes it as null , any idea how to tackle this ? one idea i have is check what schema is mismatch and add that in the properties only in that new column will get values or is there any hudi way for that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan any updates ?

@vinothchandar
Copy link
Member

cc @vingov do you mind taking a review at this, given its a python benefiting change

@vinothchandar
Copy link
Member

In some sense, with the Spark SQL support now, python users can do custom merges? does that satisfy your requirements?

@hudi-bot
Copy link
Collaborator

hudi-bot commented Nov 5, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua added priority:medium Moderate impact; usability gaps writer-core labels Sep 7, 2022
@nsivabalan
Copy link
Contributor

@fanaticjo : We landed a partial payload support via #4676.
Let us know if we can close this patch or if its possible to enhance the 4676 or if this patch is addressing something different.

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 26, 2024
@yihua
Copy link
Contributor

yihua commented Sep 10, 2024

Since there is no update on this PR for a while and Hudi already supports partial updates with a more general approach than the payload this PR proposes, closing this PR now. Feel free to reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:medium Moderate impact; usability gaps size:L PR with lines of changes in (300, 1000]

Projects

Status: 👤 User Action
Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

6 participants