[HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager by xiarixiaoyao · Pull Request #7183 · apache/hudi

xiarixiaoyao · 2022-11-11T07:19:34Z

Change Logs

Fix the bug, history schema files cannot be cleaned by FileBasedInternalSchemaStorageManager
Fix the bug, schema evolution cannot worked very well on non-batch read mode under spark3.1x
optimize implement for compaction.

Impact

none

Risk level (write none, low medium or high below)

medium

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

alexeykudinkin · 2022-11-11T18:23:47Z

Thanks for putting up a fix @xiarixiaoyao!

Let's sequence this one w/ #6358, which tackles some of our longstanding issues w/ schema handling (slightly overlapping w/ your PR)

xiarixiaoyao · 2022-11-13T01:08:25Z

Thanks for putting up a fix @xiarixiaoyao!

Let's sequence this one w/ #6358, which tackles some of our longstanding issues w/ schema handling (slightly overlapping w/ your PR)

ok, will until 6358 is merged and then update this pr.

codope · 2022-12-07T12:49:33Z

@xiarixiaoyao Can you please rebase?

…ion enviroment

xiarixiaoyao · 2022-12-08T10:29:08Z

@codope already rebased code.

xiarixiaoyao · 2022-12-08T10:31:12Z

.../hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java

      InternalSchema mergedSchema = new InternalSchemaMerger(writeInternalSchema, querySchema,
          true, false, false).mergeSchema();
      Schema newWriterSchema = AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName());
-      Schema writeSchemaFromFile = AvroInternalSchemaConverter.convert(writeInternalSchema, newWriterSchema.getFullName());


Remove useless check

xiarixiaoyao · 2022-12-08T10:32:13Z

...t-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java

+      boolean schemaEvolutionEnable = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata().isPresent();
+      Pair<Option<String>, Option<String>> schemaPair = Pair.of(Option.empty(), Option.empty());
+      if (schemaEvolutionEnable) {
+        schemaPair = InternalSchemaCache.getInternalSchemaAndAvroSchemaForClusteringAndCompaction(table.getMetaClient(), instantTime);


Optimize the code, trigger the corresponding logic only when the schema evolution is enabled

xiarixiaoyao · 2022-12-08T10:38:58Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java

-        hoodieTableMetaClient, false);
+    if (historicalSchemas == null) {
+      FileBasedInternalSchemaStorageManager schemaStorageManager = new FileBasedInternalSchemaStorageManager(hoodieTableMetaClient);
+      historicalSchemas = SerDeHelper.parseSchemas(schemaStorageManager.getHistorySchemaStr());


cache historical schema，reduce the overhead of search fileSchema.
in our env. we have a log with 1700+ avroBlock,
In original logic， it is very time-consuming to do 1700 fileSchema lookup operations

xiarixiaoyao · 2022-12-09T01:41:33Z

@hudi-bot run azure

hudi-bot · 2022-12-09T04:35:40Z

CI report:

3cb5b94 UNKNOWN
9045b71 UNKNOWN
0265ffa UNKNOWN
e45b36b Azure: FAILURE Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

bvaradar · 2023-10-30T21:19:21Z

@xiarixiaoyao : Is this still an issue ? If so, Can you please update this PR based on changes from #6358 and rebase. I will review and land this diff.

yihua

@xiarixiaoyao @bvaradar is this PR still necessary based on the latest master?

xiarixiaoyao force-pushed the schemaFix branch 2 times, most recently from ac3457c to 8a74bb3 Compare November 11, 2022 07:38

xushiyan added area:schema Schema evolution and data types priority:critical Production degraded; pipelines stalled labels Nov 11, 2022

xushiyan assigned alexeykudinkin Nov 11, 2022

xiarixiaoyao mentioned this pull request Nov 15, 2022

[HUDI-3981] Flink engine support for comprehensive schema evolution #5830

Merged

4 tasks

xiarixiaoyao force-pushed the schemaFix branch 2 times, most recently from 113b079 to 91ffb11 Compare November 28, 2022 10:32

xiarixiaoyao requested a review from XuQianJin-Stars November 28, 2022 10:33

xiarixiaoyao force-pushed the schemaFix branch 3 times, most recently from e0299dc to 3f352e7 Compare November 28, 2022 10:57

nsivabalan added the release-0.12.2 Patches targetted for 0.12.2 label Dec 6, 2022

xiarixiaoyao changed the title ~~[HUDI-5194][WIP]Fix problems found in schema evolution in the production enviroment~~ [HUDI-5194]Fix problems found in schema evolution in the production enviroment Dec 6, 2022

codope changed the title ~~[HUDI-5194]Fix problems found in schema evolution in the production enviroment~~ [HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager Dec 7, 2022

[HUDI-5194][WIP]Fix problems found in schema evolution in the product…

6f66ed8

…ion enviroment

xiarixiaoyao force-pushed the schemaFix branch 2 times, most recently from 90c90ea to 3cb5b94 Compare December 8, 2022 10:28

xiarixiaoyao commented Dec 8, 2022

View reviewed changes

xiarixiaoyao force-pushed the schemaFix branch 2 times, most recently from 15bb7a5 to cee3746 Compare December 8, 2022 10:35

xiarixiaoyao commented Dec 8, 2022

View reviewed changes

xiarixiaoyao force-pushed the schemaFix branch 3 times, most recently from ee75297 to 0265ffa Compare December 8, 2022 10:49

rebase

e45b36b

xiarixiaoyao force-pushed the schemaFix branch from 0265ffa to e45b36b Compare December 8, 2022 10:56

xiarixiaoyao requested review from alexeykudinkin and nsivabalan December 9, 2022 07:12

xiarixiaoyao assigned XuQianJin-Stars Dec 9, 2022

bvaradar added the type:bug Bug reports and fixes label Oct 4, 2023

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 26, 2024

yihua reviewed Sep 13, 2024

View reviewed changes

hudi-bot mentioned this pull request Dec 9, 2025

fix schema evolution bugs #15557

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager#7183

[HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager#7183
xiarixiaoyao wants to merge 2 commits intoapache:masterfrom
xiarixiaoyao:schemaFix

xiarixiaoyao commented Nov 11, 2022

Uh oh!

alexeykudinkin commented Nov 11, 2022

Uh oh!

xiarixiaoyao commented Nov 13, 2022

Uh oh!

codope commented Dec 7, 2022

Uh oh!

xiarixiaoyao commented Dec 8, 2022

Uh oh!

xiarixiaoyao Dec 8, 2022

Uh oh!

xiarixiaoyao Dec 8, 2022

Uh oh!

xiarixiaoyao Dec 8, 2022

Uh oh!

xiarixiaoyao commented Dec 9, 2022

Uh oh!

hudi-bot commented Dec 9, 2022

Uh oh!

bvaradar commented Oct 30, 2023

Uh oh!

yihua left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

xiarixiaoyao commented Nov 11, 2022

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

alexeykudinkin commented Nov 11, 2022

Uh oh!

xiarixiaoyao commented Nov 13, 2022

Uh oh!

codope commented Dec 7, 2022

Uh oh!

xiarixiaoyao commented Dec 8, 2022

Uh oh!

xiarixiaoyao Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao commented Dec 9, 2022

Uh oh!

hudi-bot commented Dec 9, 2022

CI report:

Uh oh!

bvaradar commented Oct 30, 2023

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants