[HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager#7183
[HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager#7183xiarixiaoyao wants to merge 2 commits intoapache:masterfrom
Conversation
ac3457c to
8a74bb3
Compare
|
Thanks for putting up a fix @xiarixiaoyao! Let's sequence this one w/ #6358, which tackles some of our longstanding issues w/ schema handling (slightly overlapping w/ your PR) |
ok, will until 6358 is merged and then update this pr. |
113b079 to
91ffb11
Compare
e0299dc to
3f352e7
Compare
|
@xiarixiaoyao Can you please rebase? |
90c90ea to
3cb5b94
Compare
|
@codope already rebased code. |
| InternalSchema mergedSchema = new InternalSchemaMerger(writeInternalSchema, querySchema, | ||
| true, false, false).mergeSchema(); | ||
| Schema newWriterSchema = AvroInternalSchemaConverter.convert(mergedSchema, writerSchema.getFullName()); | ||
| Schema writeSchemaFromFile = AvroInternalSchemaConverter.convert(writeInternalSchema, newWriterSchema.getFullName()); |
There was a problem hiding this comment.
Remove useless check
| boolean schemaEvolutionEnable = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata().isPresent(); | ||
| Pair<Option<String>, Option<String>> schemaPair = Pair.of(Option.empty(), Option.empty()); | ||
| if (schemaEvolutionEnable) { | ||
| schemaPair = InternalSchemaCache.getInternalSchemaAndAvroSchemaForClusteringAndCompaction(table.getMetaClient(), instantTime); |
There was a problem hiding this comment.
Optimize the code, trigger the corresponding logic only when the schema evolution is enabled
15bb7a5 to
cee3746
Compare
| hoodieTableMetaClient, false); | ||
| if (historicalSchemas == null) { | ||
| FileBasedInternalSchemaStorageManager schemaStorageManager = new FileBasedInternalSchemaStorageManager(hoodieTableMetaClient); | ||
| historicalSchemas = SerDeHelper.parseSchemas(schemaStorageManager.getHistorySchemaStr()); |
There was a problem hiding this comment.
cache historical schema,reduce the overhead of search fileSchema.
in our env. we have a log with 1700+ avroBlock,
In original logic, it is very time-consuming to do 1700 fileSchema lookup operations
ee75297 to
0265ffa
Compare
0265ffa to
e45b36b
Compare
|
@hudi-bot run azure |
|
@xiarixiaoyao : Is this still an issue ? If so, Can you please update this PR based on changes from #6358 and rebase. I will review and land this diff. |
yihua
left a comment
There was a problem hiding this comment.
@xiarixiaoyao @bvaradar is this PR still necessary based on the latest master?
Change Logs
Fix the bug, history schema files cannot be cleaned by FileBasedInternalSchemaStorageManager
Fix the bug, schema evolution cannot worked very well on non-batch read mode under spark3.1x
optimize implement for compaction.
Impact
none
Risk level (write none, low medium or high below)
medium
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist