Skip to content

Conversation

@nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented May 23, 2021

Redo of #2309

What is the purpose of the pull request

If schema contains nested records, then HoodieAvroUtils rewrite() function copies the record fields as-is, from the oldrecord to the newRecord. If fields of the nested record have evolved, it would result in SchemaCompatibilityException or ArrayIndexOutOfBoundsException.

Brief change log

Modify HoodieAvroUtils rewrite() to rewrite the evolved fields, with new/evolved fields initialized to null.

Verify this pull request

This pull request is already covered by existing tests, such as TestHoodieAvroUtils.
Added testRewriteToEvolvedNestedRecord() and testRewriteToShorterRecord()

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codecov-commenter
Copy link

codecov-commenter commented May 23, 2021

Codecov Report

Merging #2982 (92ca2e9) into master (7d2971d) will increase coverage by 16.07%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #2982       +/-   ##
=============================================
+ Coverage     54.80%   70.88%   +16.07%     
+ Complexity     3826      386     -3440     
=============================================
  Files           485       54      -431     
  Lines         23424     2016    -21408     
  Branches       2495      241     -2254     
=============================================
- Hits          12838     1429    -11409     
+ Misses         9431      454     -8977     
+ Partials       1155      133     -1022     
Flag Coverage Δ
hudicli ?
hudiclient ?
hudicommon ?
hudiflink ?
hudihadoopmr ?
hudisparkdatasource ?
hudisync ?
huditimelineservice ?
hudiutilities 70.88% <ø> (+1.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...udi/utilities/deltastreamer/BootstrapExecutor.java 82.35% <0.00%> (ø)
...g/apache/hudi/sink/partitioner/BucketAssigner.java
...adoop/realtime/HoodieHFileRealtimeInputFormat.java
...rg/apache/hudi/schema/FilebasedSchemaProvider.java
.../src/main/java/org/apache/hudi/dla/util/Utils.java
.../org/apache/hudi/streamer/FlinkStreamerConfig.java
...ecution/datasources/Spark3ParsePartitionUtil.scala
...udi/common/table/log/block/HoodieCommandBlock.java
...ache/hudi/sink/compact/CompactionPlanOperator.java
...java/org/apache/hudi/cli/HoodieCliSparkConfig.java
... and 424 more

@nsivabalan nsivabalan added the priority:critical Production degraded; pipelines stalled label May 24, 2021
@nsivabalan nsivabalan force-pushed the hoodieAvroUtils_rewriteToEvolvedSchema branch from 272b5a7 to 92ca2e9 Compare May 31, 2021 15:59
@nsivabalan nsivabalan requested a review from n3nash May 31, 2021 16:00
@nsivabalan nsivabalan added status:in-progress Work in progress and removed priority:critical Production degraded; pipelines stalled labels Jun 1, 2021
@vinothchandar vinothchandar self-assigned this Sep 7, 2021
@vinothchandar
Copy link
Member

cc @codope could you chime in here?

@hudi-bot
Copy link
Collaborator

hudi-bot commented Nov 5, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan assigned codope and unassigned vinothchandar Jan 20, 2022
@nsivabalan
Copy link
Contributor Author

@codope : can you review this patch please.

@nsivabalan nsivabalan removed the status:in-progress Work in progress label Jan 20, 2022
Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Left a few minor comments.

}

@Test
public void testRewriteToEvolvedNestedRecord() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove Exception here and below?

}
return datum;
case UNION:
Integer idx = (newSchema.getTypes().get(0).getType() == Schema.Type.NULL) ? 1 : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of wrapper class, can we use the primitive type int here?

assertEquals("val2", newRecord.get("pii_col"));
assertEquals(null, ((GenericRecord)newRecord.get("color_rec")).get("color_name"));
} catch (Exception e) {
e.printStackTrace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove printStackTrace here and below?

@yihua yihua added area:schema Schema evolution and data types priority:high Significant impact; potential bugs labels Sep 7, 2022
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Feb 26, 2024
@yihua
Copy link
Contributor

yihua commented Sep 5, 2024

Is this still relevant? @jonvex could you check this too?

@nsivabalan nsivabalan closed this Sep 6, 2024
@yihua
Copy link
Contributor

yihua commented Sep 9, 2024

The revamp PR is #11893.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:schema Schema evolution and data types priority:high Significant impact; potential bugs size:M PR with lines of changes in (100, 300]

Projects

Status: 👤 User Action
Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

6 participants