-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-4139] improvement for flink sink operators so we can easily identify the table which write to hudi #5661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…suite framework (#5557) * Add support to test async table services with integ test suite framework * Make await time for validation configurable
…or flink (#5660) * Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected * Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary * Remove the modification of read code path in HoodieTableConfig
[HUDI-2207] Support independent flink hudi clustering function
Co-authored-by: wangzixuan.wzxuan <[email protected]>
…oodie Writing Efficiency (#5567) Co-authored-by: yuezhang <[email protected]>
Co-authored-by: y00617041 <[email protected]>
…ation=bulk_insert in sql mode (#5679) Co-authored-by: Rex An <[email protected]>
…ering (#5563) Co-authored-by: 苏承祥 <[email protected]>
…hing rollback plan during rollback (#5703) If the avro file is corrupted, an InvalidAvroMagicException throws.
| dataStream = dataStream | ||
| .transform("partition_key_sorter", | ||
| .transform("partition_key_sorter" + ":" + conf.getString(FlinkOptions.TABLE_NAME), | ||
| TypeInformation.of(RowData.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe only the write op with table name is enough, for e.g
- bucket_bulk_insert
- hoodie_bulk_insert_write
- hoodie_append_write
- bucket_write
- stream_write
And we can extract a common util method here like
writeOpIdentifier(String, Configuration)
for generating operator name with table suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right , i will try to amend it with your suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have created a new PR in #5744
…leDeltaStreamer (#5597) * added --sync-tool-classes config option in multitable delta streamer * added a testcase to assert if syncClientToolClassNames is getting picked to the deltastreamer execution context
Co-authored-by: Raymond Xu <[email protected]>
#5716) The timeline refresh on table initialization invokes the fs view #sync, which has two actions now: 1. reload the timeline of the fs view, so that the next fs view request is based on this timeline metadata 2. if this is a local fs view, clear all the local states; if this is a remote fs view, send request to sync the remote fs view But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest, the table is also constructed freshly, so the fs view has no local states, that means, the #sync is unnecessary totally. In this patch, the metadata lifecycle and data set fs view are kept in sync, when the fs view is refreshed, the underneath metadata is also refreshed synchronouly. The freshness of the metadata follows the same rules as data fs view: 1. if the fs view is local, the visibility is based on the client table metadata client's latest commit 2. if the fs view is remote, the timeline server would #sync the fs view and metadata together based on the lagging server local timeline From the perspective of client, no need to care about the refresh action anymore no matter whether the metadata table is enabled or not. That make the client logic more clear and less error-prone. Removes the timeline refresh has another benefit: if avoids unncecessary #refresh of the remote fs view, if all the clients send request to #sync the remote fs view, the server would encounter conflicts and the client encounters a response error.
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.