-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5484] Avoid using GenericRecord in HoodieColumnStatMetadata
#7573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hudi-bot run azure |
|
AVRO-2377 1.9.2 Modified the type of SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter |
| } | ||
|
|
||
| @Test | ||
| public void testSerHoodieMetadataPayload() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mvn test -Punit-tests -pl hudi-common -am -B -DfailIfNoTests=false -Dtest=TestSerializationUtils -Pspark3.2
|
@hudi-bot run azure |
1 similar comment
|
@hudi-bot run azure |
|
@cxzl25 please update the issue with the description of the root-cause as well |
GenericRecord in HoodieColumnStatMetadata
| } | ||
|
|
||
| @Test | ||
| public void testSerHoodieMetadataPayload() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this test to hudi-spark module to make sure it's being run against every Spark version
| .setColumnName((String) columnStatsRecord.get(COLUMN_STATS_FIELD_COLUMN_NAME)) | ||
| .setMinValue(columnStatsRecord.get(COLUMN_STATS_FIELD_MIN_VALUE)) | ||
| .setMaxValue(columnStatsRecord.get(COLUMN_STATS_FIELD_MAX_VALUE)) | ||
| .setMinValue(wrapStatisticValue(unwrapStatisticValueWrapper(columnStatsRecord.get(COLUMN_STATS_FIELD_MIN_VALUE)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment explaining why we need to do that here
|
@hudi-bot run azure |
|
Thank you very much for fixing this @cxzl25! |
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later. (cherry picked from commit 6727519)
Change Logs
Avoid using GenericRecord in ColumnStatMetadata.
HoodieMetadataPayloadis constructed usingGenericRecordwith reflection, andcolumnStatMetadatastoresminValueandmaxValue, both of which areGenericRecordtypes.Once spill is generated, kryo deserialization fails.
Root cause
AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.
https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483
SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.
As a result, Hudi may encounter
UnsupportedOperationExceptionwhen running Spark3.2.0 or later.Fail log
construct HoodieMetadataPayload
Impact
cause write failure
Risk level (write none, low medium or high below)
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist