Skip to content

[SUPPORT] Inconsistent reader and writer schema in HoodieAvroDataBlock cause exception #7284

@TengHuo

Description

@TengHuo

Related JIRA ticket: https://issues.apache.org/jira/browse/HUDI-5271

Describe the problem you faced

When using Spark to create a hudi table with these config,

  • INMEMORY or CONSISTENT_HASHING BUCKET index
  • Decimal data type in schema
  • MOR Hudi table
  • Use Spark catalog

It will trigger an exception when query this table.

ERROR org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader [] - Got exception when reading log file
org.apache.avro.AvroTypeException: Found hoodie.test_mor_tab.test_mor_tab_record.new_test_col.fixed, expecting union
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:209)
...

To Reproduce

Steps to reproduce the behavior:

class TestInsertTable extends HoodieSparkSqlTestBase {

  test("Test Insert Into MOR table") {
    withTempDir { tmp =>
      val tableName = "test_mor_tab"
      // Create a partitioned table
      spark.sql(
        s"""
           |create table $tableName (
           |  id int,
           |  dt string,
           |  name string,
           |  price double,
           |  ts long,
           |  new_test_col decimal(25, 4) comment 'a column for test decimal type'
           |) using hudi
           |options
           |(
           |    type = 'mor'
           |    ,primaryKey = 'id'
           |    ,hoodie.index.type = 'INMEMORY'
           |)
           | tblproperties (primaryKey = 'id')
           | partitioned by (dt)
           | location '${tmp.getCanonicalPath}'
       """.stripMargin)

      // Note: Do not write the field alias, the partition field must be placed last.
      spark.sql(
        s"""
           | insert into $tableName values
           | (1, 'a1', 10, 1000, 1.0, "2021-01-05"),
           | (2, 'a2', 20, 2000, 2.0, "2021-01-06"),
           | (3, 'a3', 30, 3000, 3.0, "2021-01-07")
              """.stripMargin)

      spark.sql(s"select id, name, price, ts, dt from $tableName").show(false)
    }
  }
}
  1. Rdd this test case in module hudi-spark-datasource/hudi-spark in the test class org.apache.spark.sql.hudi.TestInsertTable

Expected behavior

This test case should run properly without any exception

Environment Description

  • Hudi version : latest master branch, commit 3109d890f13b1b29e5796a9f34ab28fa898ec23c

  • Spark version : tried Spark 2.4/3.1, all have the same issue

  • Hive version : N/A

  • Hadoop version : N/A

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

The full error stack of the above test case


19963 [ScalaTest-run-running-TestInsertTable] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator [] - Code generated in 20.563541 ms
20015 [ScalaTest-run-running-TestInsertTable] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator [] - Code generated in 18.67177 ms
20036 [ScalaTest-run-running-TestInsertTable] INFO  org.apache.spark.SparkContext [] - Starting job: apply at OutcomeOf.scala:85
20036 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler [] - Got job 24 (apply at OutcomeOf.scala:85) with 1 output partitions
20036 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler [] - Final stage: ResultStage 35 (apply at OutcomeOf.scala:85)
20036 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler [] - Parents of final stage: List()
20036 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler [] - Missing parents: List()
20037 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler [] - Submitting ResultStage 35 (MapPartitionsRDD[71] at apply at OutcomeOf.scala:85), which has no missing parents
20051 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore [] - Block broadcast_33 stored as values in memory (estimated size 15.4 KB, free 2002.3 MB)
20060 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore [] - Block broadcast_33_piece0 stored as bytes in memory (estimated size 7.4 KB, free 2002.3 MB)
20060 [dispatcher-event-loop-0] INFO  org.apache.spark.storage.BlockManagerInfo [] - Added broadcast_33_piece0 in memory on 10.2.175.58:53317 (size: 7.4 KB, free: 2004.2 MB)
20061 [dag-scheduler-event-loop] INFO  org.apache.spark.SparkContext [] - Created broadcast 33 from broadcast at DAGScheduler.scala:1161
20061 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler [] - Submitting 1 missing tasks from ResultStage 35 (MapPartitionsRDD[71] at apply at OutcomeOf.scala:85) (first 15 tasks are for partitions Vector(0))
20061 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.TaskSchedulerImpl [] - Adding task set 35.0 with 1 tasks
20064 [dispatcher-event-loop-1] INFO  org.apache.spark.scheduler.TaskSetManager [] - Starting task 0.0 in stage 35.0 (TID 44, localhost, executor driver, partition 0, PROCESS_LOCAL, 8249 bytes)
20064 [Executor task launch worker for task 44] INFO  org.apache.spark.executor.Executor [] - Running task 0.0 in stage 35.0 (TID 44)
20080 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.HoodieTableMetaClient [] - Loading HoodieTableMetaClient from file:/private/var/folders/n7/7v_cwpdn79lc75czxd84bdd8mwzd9l/T/spark-cdd1a67d-c0be-4c46-826a-445e29dfa751
20081 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.HoodieTableConfig [] - Loading table properties from file:/private/var/folders/n7/7v_cwpdn79lc75czxd84bdd8mwzd9l/T/spark-cdd1a67d-c0be-4c46-826a-445e29dfa751/.hoodie/hoodie.properties
20081 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.HoodieTableMetaClient [] - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from file:/private/var/folders/n7/7v_cwpdn79lc75czxd84bdd8mwzd9l/T/spark-cdd1a67d-c0be-4c46-826a-445e29dfa751
20083 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Loaded instants upto : Option{val=[20221123105000131__deltacommit__COMPLETED]}
20083 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader [] - Scanning log file HoodieLogFile{pathStr='file:/private/var/folders/n7/7v_cwpdn79lc75czxd84bdd8mwzd9l/T/spark-cdd1a67d-c0be-4c46-826a-445e29dfa751/dt=2021-01-05/.04aba946-8423-4ddd-9d04-fbbd91ba37a2-0_20221123105000131.log.1_0-17-17', fileLen=-1}
20084 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader [] - Reading a data block from file file:/private/var/folders/n7/7v_cwpdn79lc75czxd84bdd8mwzd9l/T/spark-cdd1a67d-c0be-4c46-826a-445e29dfa751/dt=2021-01-05/.04aba946-8423-4ddd-9d04-fbbd91ba37a2-0_20221123105000131.log.1_0-17-17 at instant 20221123105000131
20084 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader [] - Merging the final data blocks
20084 [Executor task launch worker for task 44] INFO  org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader [] - Number of remaining logblocks to merge 1
20086 [Executor task launch worker for task 44] ERROR org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader [] - Got exception when reading log file
org.apache.avro.AvroTypeException: Found hoodie.test_mor_tab.test_mor_tab_record.new_test_col.fixed, expecting union
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:207) ~[classes/:?]
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:144) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:633) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:715) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:368) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:209) ~[classes/:?]
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:112) ~[classes/:?]
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:105) ~[classes/:?]
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:343) ~[classes/:?]
	at org.apache.hudi.LogFileIterator$.scanLog(LogFileIterator.scala:305) ~[classes/:?]
	at org.apache.hudi.LogFileIterator.<init>(LogFileIterator.scala:88) ~[classes/:?]
	at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:96) ~[classes/:?]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.scheduler.Task.run(Task.scala:123) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_345]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_345]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_345]
20095 [Executor task launch worker for task 44] ERROR org.apache.spark.executor.Executor [] - Exception in task 0.0 in stage 35.0 (TID 44)
org.apache.hudi.exception.HoodieException: Exception when reading log file 
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:377) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:209) ~[classes/:?]
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:112) ~[classes/:?]
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:105) ~[classes/:?]
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:343) ~[classes/:?]
	at org.apache.hudi.LogFileIterator$.scanLog(LogFileIterator.scala:305) ~[classes/:?]
	at org.apache.hudi.LogFileIterator.<init>(LogFileIterator.scala:88) ~[classes/:?]
	at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:96) ~[classes/:?]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.scheduler.Task.run(Task.scala:123) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) ~[spark-core_2.11-2.4.4.jar:2.4.4]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_345]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_345]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_345]
Caused by: org.apache.avro.AvroTypeException: Found hoodie.test_mor_tab.test_mor_tab_record.new_test_col.fixed, expecting union
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) ~[avro-1.8.2.jar:1.8.2]
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:207) ~[classes/:?]
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:144) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:633) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:715) ~[classes/:?]
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:368) ~[classes/:?]
	... 27 more
20111 [task-result-getter-0] WARN  org.apache.spark.scheduler.TaskSetManager [] - Lost task 0.0 in stage 35.0 (TID 44, localhost, executor driver): org.apache.hudi.exception.HoodieException: Exception when reading log file 
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:377)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:209)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:112)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:105)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:343)
	at org.apache.hudi.LogFileIterator$.scanLog(LogFileIterator.scala:305)
	at org.apache.hudi.LogFileIterator.<init>(LogFileIterator.scala:88)
	at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:96)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.avro.AvroTypeException: Found hoodie.test_mor_tab.test_mor_tab_record.new_test_col.fixed, expecting union
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:207)
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:144)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:633)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:715)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:368)
	... 27 more

20113 [task-result-getter-0] ERROR org.apache.spark.scheduler.TaskSetManager [] - Task 0 in stage 35.0 failed 1 times; aborting job
20113 [task-result-getter-0] INFO  org.apache.spark.scheduler.TaskSchedulerImpl [] - Removed TaskSet 35.0, whose tasks have all completed, from pool 
20116 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.TaskSchedulerImpl [] - Cancelling stage 35
20117 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.TaskSchedulerImpl [] - Killing all running tasks in stage 35: Stage cancelled
20118 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler [] - ResultStage 35 (apply at OutcomeOf.scala:85) failed in 0.081 s due to Job aborted due to stage failure: Task 0 in stage 35.0 failed 1 times, most recent failure: Lost task 0.0 in stage 35.0 (TID 44, localhost, executor driver): org.apache.hudi.exception.HoodieException: Exception when reading log file 
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:377)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:209)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:112)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:105)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:343)
	at org.apache.hudi.LogFileIterator$.scanLog(LogFileIterator.scala:305)
	at org.apache.hudi.LogFileIterator.<init>(LogFileIterator.scala:88)
	at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:96)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.avro.AvroTypeException: Found hoodie.test_mor_tab.test_mor_tab_record.new_test_col.fixed, expecting union
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:207)
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:144)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:633)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:715)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:368)
	... 27 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:writerWrite client and core write operationspriority:highSignificant impact; potential bugsrelease-0.12.2Patches targetted for 0.12.2

    Type

    No type

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions