Skip to content

Conversation

@hechao-ustc
Copy link
Contributor

Change Logs

Add record count payload to support pv/uv.

Impact

No impact.

Risk level (write none, low medium or high below)

None

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

* Payload clazz that is used for pv/uv.
* In order to use 'RecordCountAvroPayload', we need to add field [hoodie_record_count bigint]
* to the schema when creating the hudi table to record the result of pv/uv, field 'hoodie_record_count'
* does not need to be filled, and flink will automatically set it to "null", "null" represents 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"null" represents 1 is a little strange, can be set to 1 explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment, in fact the null is updated to 1 in #mergeOldRecord and #getInsertValue.

return Option.empty();
} else {
try {
// Flink automatically set 'hoodie_record_count' to 'null', here updated to 1, so that the query result is 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this payload is only used for flink but not for spark?

Copy link
Contributor Author

@hechao-ustc hechao-ustc Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark is also supported, but I noticed that #5627 and #7345 has combined CombineAndGetUpdateValue and Precombine into one API according to rfc-46. The current RecordCountAvroPayload and PartialUpdateAvroPayload need to be adapted. I will create a new PR later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hechao-ustc no need to create a new PR, you would fix it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this payload is only used for flink but not for spark?

both can use.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants