Skip to content

Conversation

@sbernauer
Copy link
Contributor

What is the purpose of the pull request

Specific fix this error: https://issues.apache.org/jira/browse/HUDI-1129
This is needed to partially fix #1845

Brief change log

  • Use avro field names and not indices to convert from avro to GenericRow of catalyst at AvroConversionHelper

Verify this pull request

Run maven test suite

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sbernauer for raising the diff.

Can you add the schema evolution test from your other diff here so that we can ensure the test passes.

val length = struct.fields.length
val converters = new Array[AnyRef => AnyRef](length)
val avroFieldIndexes = new Array[Int](length)
val avroFieldNames = new Array[String](length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question : Does this work for nested schemas where same name is used in different hierarchy ? For example : "order.rec.rec" (I am just making this up) but wanted to make sure if there are any chances of ambiguity in field resolution that can arise ? Can you add some test-cases to verify this would work fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will try to write some tests

@bvaradar bvaradar self-assigned this Jul 31, 2020
@sbernauer
Copy link
Contributor Author

Hi @bvaradar the test currently fails because of the EOFException from https://issues.apache.org/jira/browse/HUDI-1128

@vinothchandar vinothchandar changed the title HUDI-1129: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema [HUDI-1129]: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema Sep 9, 2020
@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Dec 26, 2020
@vinothchandar vinothchandar assigned nsivabalan and unassigned bvaradar and umehrot2 Dec 26, 2020
@nsivabalan
Copy link
Contributor

I found the issue after some debugging. But need your thoughts on whether it is a bug or how to go about fixing it.
@bvaradar @n3nash @vinothchandar

As per the test case linked to reproduce, here is what we are doing.
Generate records with SCHEMA_1 and ingest to Hudi with SCHEMA_1
Generate records with SCHEMA_2 and ingest to Hudi with SCHEMA_2
Generate records with SCHEMA_1 and ingest to Hudi with SCHEMA_2(both source and target schema)// this is where the exception is thrown.

Here is the gist of the issue.
Lets say we have an avro record with SCHEMA_1
byte[] recordBytes = HoodieAvroUtils.avroToBytes(genericRecord);

Converting this back to GenRec with SCHEMA_1 succeeds. HoodieAvroUtils.bytesToAvro(recordBytes, SCHEMA_1)
But converting this back to GenRec with SCHEMA_2 (which has one additional field compared to SCHEMA_1) fails.

@vinothchandar vinothchandar removed the priority:blocker Production down; release blocker label Feb 6, 2021
@nsivabalan
Copy link
Contributor

This is not required anymore. #2927 handles the schema evol.

@nsivabalan nsivabalan closed this Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SUPPORT] Support for Schema evolution. Facing an error

5 participants