Skip to content

Conversation

@Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented Aug 30, 2023

Change Logs

Current hive read schema evolution MOR table will error, e.g. :

-- spark-sql
set hoodie.schema.on.read.enable=true;
create table if not exists hudi_mor_test_tbl (
  id   bigint,
  name string,
  num  int,
  ts   bigint,
  ds   string
) using hudi 
tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts'
 )
partitioned by (ds);

insert into hudi_mor_test_tbl partition(ds = '20211211') select 1, 'a1', 1000,100;
update hudi_mor_test_tbl set name = 'a2' where id = 1;
alter table hudi_mor_test_tbl rename column name to name_new; 

-- hive
select id,name_new from hudi_mor_test_tbl_rt;
Failed with exception java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 25

Impact

Fix above

Risk level (write none, low medium or high below)

low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@xiarixiaoyao
Copy link
Contributor

@Zouxxyy
thanks for your fix.
could you pls point out the specific reason for the error? Thank you
i cannot reproduce this problem on hive 3.1.1 (hudi 0.11) with my cluster

@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented Aug 30, 2023

thanks for your fix. could you pls point out the specific reason for the error? Thank you i cannot reproduce this problem on hive 3.1.1 (hudi 0.11) with my cluster

The core change is internalSchemaOption = Option.of(prunedInternalSchema); You can remove it and then run the ut add in this patch

hudi 0.11 may not have this patch #6989 and #6358

  /**
   * Get final Read Schema for support evolution.
   * step1: find the fileSchema for current dataBlock.
   * step2: determine whether fileSchema is compatible with the final read internalSchema.
   * step3: merge fileSchema and read internalSchema to produce final read schema.
   *
   * @param dataBlock current processed block
   * @return final read schema.
   */
  private Option<Pair<Function<HoodieRecord, HoodieRecord>, Schema>> composeEvolvedSchemaTransformer(
      HoodieDataBlock dataBlock) {
    if (internalSchema.isEmptySchema()) {
      return Option.empty();
    }

    long currentInstantTime = Long.parseLong(dataBlock.getLogBlockHeader().get(INSTANT_TIME));
    InternalSchema fileSchema = InternalSchemaCache.searchSchemaAndCache(currentInstantTime,
        hoodieTableMetaClient, false);
    InternalSchema mergedInternalSchema = new InternalSchemaMerger(fileSchema, internalSchema,
        true, false).mergeSchema();
    Schema mergedAvroSchema = AvroInternalSchemaConverter.convert(mergedInternalSchema, readerSchema.getFullName());

    return Option.of(Pair.of((record) -> {
      return record.rewriteRecordWithNewSchema(
          dataBlock.getSchema(),
          this.hoodieTableMetaClient.getTableConfig().getProps(),
          mergedAvroSchema,
          Collections.emptyMap());
    }, mergedAvroSchema));
  }

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 added engine:hive Hive integration schema-evolution area:schema Schema evolution and data types labels Aug 31, 2023
@Zouxxyy Zouxxyy closed this Sep 1, 2023
@Zouxxyy Zouxxyy reopened this Sep 1, 2023
@danny0405 danny0405 merged commit 31bc565 into apache:master Sep 5, 2023
@danny0405 danny0405 added release-0.14.0 priority:blocker Production down; release blocker labels Sep 5, 2023
leosanqing pushed a commit to leosanqing/hudi that referenced this pull request Sep 13, 2023
TheR1sing3un pushed a commit to TheR1sing3un/hudi that referenced this pull request Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:schema Schema evolution and data types engine:hive Hive integration priority:blocker Production down; release blocker release-0.14.0

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants