Skip to content

Conversation

@danny0405
Copy link
Contributor

…dieFlinkWriteClient

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert) for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table bootstrap). These bootstrapping operations are guarded by a trasanction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

Change Logs

  • Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
  • Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
  • Add a new abstract method for creating engine-specific hoodie table

Impact

No impact

Risk level (write none, low medium or high below)

none

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@danny0405
Copy link
Contributor Author

@alexeykudinkin Can you take a look at this change here, i see that doInitTable was introduced by your refactoring.

…dieFlinkWriteClient

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a trasanction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405
Copy link
Contributor Author

The failed test case testUpsertsContinuousModeWithMultipleWritersForConflicts is flaky, would merge it soon ~

@danny0405 danny0405 merged commit fd62a14 into apache:master Dec 20, 2022
danny0405 added a commit to danny0405/hudi that referenced this pull request Dec 20, 2022
…dieFlinkWriteClient (apache#7509)

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a trasanction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table

(cherry picked from commit fd62a14)
danny0405 added a commit that referenced this pull request Dec 21, 2022
…dieFlinkWriteClient (#7509) (#7522)

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a trasanction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table

(cherry picked from commit fd62a14)
h1ap pushed a commit to h1ap/hudi that referenced this pull request Jan 11, 2023
…dieFlinkWriteClient (apache#7509)

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a trasanction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table
# Conflicts:
#	hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkTable.java
#	hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/HoodieJavaTable.java
#	hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkTable.java
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Mar 22, 2023
…dieFlinkWriteClient (apache#7509)

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a trasanction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table
danny0405 added a commit to danny0405/hudi that referenced this pull request Mar 23, 2023
…dieFlinkWriteClient (apache#7509)

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long-running task. In current impl, an engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a transaction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table

(cherry picked from commit fd62a14)
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Mar 23, 2023
…dieFlinkWriteClient (apache#7509)

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long-running task. In current impl, an engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a transaction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table

(cherry picked from commit fd62a14)
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
…dieFlinkWriteClient (apache#7509)

Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert)
for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing
these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table
bootstrap). These bootstrapping operations are guarded by a trasanction lock.

In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator.

The changes:

- Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations
- Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically
- Add a new abstract method for creating engine-specific hoodie table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:flink Flink integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants