-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-4435] Fix Avro field not found issue introduced by Avro 1.10.2 #6155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ | |
|
|
||
| package org.apache.hudi.hadoop.utils; | ||
|
|
||
| import org.apache.avro.AvroRuntimeException; | ||
| import org.apache.avro.JsonProperties; | ||
| import org.apache.avro.LogicalTypes; | ||
| import org.apache.avro.Schema; | ||
|
|
@@ -189,7 +190,14 @@ public static Writable avroToArrayWritable(Object value, Schema schema) { | |
| Writable[] recordValues = new Writable[schema.getFields().size()]; | ||
| int recordValueIndex = 0; | ||
| for (Schema.Field field : schema.getFields()) { | ||
| recordValues[recordValueIndex++] = avroToArrayWritable(record.get(field.name()), field.schema()); | ||
| // TODO Revisit Avro exception handling in future | ||
| Object fieldValue = null; | ||
| try { | ||
| fieldValue = record.get(field.name()); | ||
| } catch (AvroRuntimeException e) { | ||
| LOG.debug("Field:" + field.name() + "not found in Schema:" + schema.toString()); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the field is not found, should that fail the conversion instead of filling
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the field not being found in this case is normal based on the use case mentioned in the detailed pr description.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yihua The way things are currently implemented, is that this function is supposed to return a record with complete schema. We cannot fail if the field is not found, as it is required for both bootstrap and schema evolution scenarios. In case of bootstrap, the metadata fields may not be found in the data file and need to be filled with nulls. Similarly with schema evolution, we can hit a scenario like this and historically what we do is return nulls for the new columns if the old record does not have them.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes sense to me now. If the existing behavior is to return null, then this is OK. @rahil-c could you create a ticket for revisiting the performance? |
||
| } | ||
| recordValues[recordValueIndex++] = avroToArrayWritable(fieldValue, field.schema()); | ||
| } | ||
| return new ArrayWritable(Writable.class, recordValues); | ||
| case ENUM: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we have performance concerns regarding to this change. Catching exceptions is not efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
GenericRecordApi does not have thehasFieldmethod in avro 1.8.2 https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/generic/GenericRecord.html#hasField-java.lang.String-so i think in general when it performs this get we have to do some exception catching i believe.
its only present in the 1.10.2 https://avro.apache.org/docs/1.10.2/api/java/org/apache/avro/generic/GenericRecord.html#hasField-java.lang.String-
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think also the perfomance concerns are not with the actual try catch but more so to do with the exception handling https://www.oreilly.com/library/view/programming-jakarta-struts/0596006519/ch10s02.html#:~:text=In%20general%2C%20wrapping%20your%20Java,proper%20handler%20for%20the%20exception.
In this case though i think we will need to catch the exception