Skip to content

Conversation

@cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented Dec 28, 2022

Change Logs

Avoid using GenericRecord in ColumnStatMetadata.

HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types.

Once spill is generated, kryo deserialization fails.

Root cause

AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.

https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483

SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.

As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.

Fail log

org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException
Serialization trace:
reserved (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats)
columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload)
	at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)

	at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232)
	at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45)
	at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339)
	at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520)
	at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512)
	at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
	at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101)
	at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75)
	at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210)
	at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203)
	at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199)
	at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68)
	at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195)
	at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54)
	at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188)
	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257)
	at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68)
	at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231)
	at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129)
	at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)

Caused by: java.lang.UnsupportedOperationException
	at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055)
	at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
	at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
	at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
	at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

construct HoodieMetadataPayload

	at org.apache.hudi.metadata.HoodieMetadataPayload.<init>(HoodieMetadataPayload.java:233)
	at org.apache.hudi.metadata.HoodieMetadataPayload.<init>(HoodieMetadataPayload.java:182)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hudi.common.util.HoodieRecordUtils.loadPayload(HoodieRecordUtils.java:99)
	at org.apache.hudi.common.util.SpillableMapUtils.convertToHoodieRecordPayload(SpillableMapUtils.java:140)
	at org.apache.hudi.avro.HoodieAvroUtils.createHoodieRecordFromAvro(HoodieAvroUtils.java:1078)
	at org.apache.hudi.common.model.HoodieAvroIndexedRecord.wrapIntoHoodieRecordPayloadWithParams(HoodieAvroIndexedRecord.java:168)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:644)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:688)

Impact

cause write failure

Risk level (write none, low medium or high below)

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@cxzl25
Copy link
Contributor Author

cxzl25 commented Dec 29, 2022

@hudi-bot run azure

@cxzl25
Copy link
Contributor Author

cxzl25 commented Jan 3, 2023

AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.

https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483

SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.

As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.

@XuQianJin-Stars @alexeykudinkin

}

@Test
public void testSerHoodieMetadataPayload() throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mvn test -Punit-tests -pl hudi-common -am -B -DfailIfNoTests=false -Dtest=TestSerializationUtils  -Pspark3.2

@cxzl25
Copy link
Contributor Author

cxzl25 commented Jan 3, 2023

@hudi-bot run azure

1 similar comment
@cxzl25
Copy link
Contributor Author

cxzl25 commented Jan 4, 2023

@hudi-bot run azure

@alexeykudinkin alexeykudinkin self-requested a review January 6, 2023 21:36
@alexeykudinkin alexeykudinkin self-assigned this Jan 6, 2023
@alexeykudinkin alexeykudinkin added the priority:blocker Production down; release blocker label Jan 6, 2023
@alexeykudinkin
Copy link
Contributor

@cxzl25 please update the issue with the description of the root-cause as well

@alexeykudinkin alexeykudinkin changed the title [HUDI-5484] Avoid using GenericRecord in ColumnStatMetadata [HUDI-5484] Avoid using GenericRecord in HoodieColumnStatMetadata Jan 6, 2023
}

@Test
public void testSerHoodieMetadataPayload() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this test to hudi-spark module to make sure it's being run against every Spark version

.setColumnName((String) columnStatsRecord.get(COLUMN_STATS_FIELD_COLUMN_NAME))
.setMinValue(columnStatsRecord.get(COLUMN_STATS_FIELD_MIN_VALUE))
.setMaxValue(columnStatsRecord.get(COLUMN_STATS_FIELD_MAX_VALUE))
.setMinValue(wrapStatisticValue(unwrapStatisticValueWrapper(columnStatsRecord.get(COLUMN_STATS_FIELD_MIN_VALUE))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment explaining why we need to do that here

@cxzl25
Copy link
Contributor Author

cxzl25 commented Jan 7, 2023

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 7, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@alexeykudinkin alexeykudinkin merged commit 6727519 into apache:master Jan 9, 2023
@alexeykudinkin
Copy link
Contributor

Thank you very much for fixing this @cxzl25!

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Jan 31, 2023
…apache#7573)

Avoid using GenericRecord in ColumnStatMetadata.

HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types.

Once spill is generated, kryo deserialization fails.

Root cause
AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.

https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483

SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.

As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Mar 22, 2023
…apache#7573)

Avoid using GenericRecord in ColumnStatMetadata.

HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types.

Once spill is generated, kryo deserialization fails.

Root cause
AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.

https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483

SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.

As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
…apache#7573)

Avoid using GenericRecord in ColumnStatMetadata.

HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types.

Once spill is generated, kryo deserialization fails.

Root cause
AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.

https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483

SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.

As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
flashJd pushed a commit to flashJd/hudi that referenced this pull request May 5, 2023
…apache#7573)

Avoid using GenericRecord in ColumnStatMetadata.

HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types.

Once spill is generated, kryo deserialization fails.

Root cause
AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.

https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483

SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.

As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.

(cherry picked from commit 6727519)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants