-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5413] Add record count payload to support pv/uv #7499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
Outdated
Show resolved
Hide resolved
| * Payload clazz that is used for pv/uv. | ||
| * In order to use 'RecordCountAvroPayload', we need to add field [hoodie_record_count bigint] | ||
| * to the schema when creating the hudi table to record the result of pv/uv, field 'hoodie_record_count' | ||
| * does not need to be filled, and flink will automatically set it to "null", "null" represents 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"null" represents 1 is a little strange, can be set to 1 explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comment, in fact the null is updated to 1 in #mergeOldRecord and #getInsertValue.
| return Option.empty(); | ||
| } else { | ||
| try { | ||
| // Flink automatically set 'hoodie_record_count' to 'null', here updated to 1, so that the query result is 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this payload is only used for flink but not for spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hechao-ustc no need to create a new PR, you would fix it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this payload is only used for flink but not for spark?
both can use.
Change Logs
Add record count payload to support pv/uv.
Impact
No impact.
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist