Skip to content

Conversation

@yanenze
Copy link
Contributor

@yanenze yanenze commented May 23, 2022

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

codope and others added 26 commits May 23, 2022 18:10
…suite framework (#5557)

* Add support to test async table services with integ test suite framework

* Make await time for validation configurable
…or flink (#5660)

* Remove the metadata cleaning strategy for flink, that means the multi-modal index may be affected
* Improve the HoodieTable#clearMetadataTablePartitionsConfig to only update table config when necessary
* Remove the modification of read code path in HoodieTableConfig
[HUDI-2207] Support independent flink hudi clustering function
Move HoodieMetricsCloudWatchConfig to hudi-client-common
…#5502)

* Along the lines of RDDCustomColumnsSortPartitioner but for Row
dataStream = dataStream
.transform("partition_key_sorter",
.transform("partition_key_sorter" + ":" + conf.getString(FlinkOptions.TABLE_NAME),
TypeInformation.of(RowData.class),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only the write op with table name is enough, for e.g

  • bucket_bulk_insert
  • hoodie_bulk_insert_write
  • hoodie_append_write
  • bucket_write
  • stream_write

And we can extract a common util method here like

writeOpIdentifier(String, Configuration)

for generating operator name with table suffix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right , i will try to amend it with your suggestion

Copy link
Contributor Author

@yanenze yanenze Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have created a new PR in #5744

kumudkumartirupati and others added 11 commits May 31, 2022 20:27
…leDeltaStreamer (#5597)

* added --sync-tool-classes config option in multitable delta streamer

* added a testcase to assert if syncClientToolClassNames is getting picked to the deltastreamer execution context
#5716)

The timeline refresh on table initialization invokes the fs view #sync, which has two actions now:

1. reload the timeline of the fs view, so that the next fs view request is based on this timeline metadata
2. if this is a local fs view, clear all the local states; if this is a remote fs view, send request to sync the remote fs view

But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest,
the table is also constructed freshly, so the fs view has no local states, that means, the #sync is unnecessary totally.

In this patch, the metadata lifecycle and data set fs view are kept in sync, when the fs view is refreshed, the underneath metadata
is also refreshed synchronouly. The freshness of the metadata follows the same rules as data fs view:

1. if the fs view is local, the visibility is based on the client table metadata client's latest commit
2. if the fs view is remote, the timeline server would #sync the fs view and metadata together based on the lagging server local timeline

From the perspective of client, no need to care about the refresh action anymore no matter whether the metadata table is enabled or not.
That make the client logic more clear and less error-prone.

Removes the timeline refresh has another benefit: if avoids unncecessary #refresh of the remote fs view, if all the clients send request to #sync the
remote fs view, the server would encounter conflicts and the client encounters a response error.
@yanenze yanenze closed this Jun 3, 2022
@yanenze yanenze reopened this Jun 3, 2022
@yanenze yanenze closed this Jun 3, 2022
@yanenze yanenze reopened this Jun 3, 2022
@yanenze yanenze closed this Jun 3, 2022
@yanenze yanenze changed the title [HUDI-4139] improvement for flink sink operators so we can easily identify the table which write to hudi [HUDI-4139-abort] improvement for flink sink operators so we can easily identify the table which write to hudi Jun 3, 2022
@yanenze yanenze changed the title [HUDI-4139-abort] improvement for flink sink operators so we can easily identify the table which write to hudi [HUDI-4139] improvement for flink sink operators so we can easily identify the table which write to hudi Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.