[HUDI-5759] Supports add column on mor table with log #7915

alexeykudinkin · 2023-02-10T18:04:34Z

This code is actually borrowed from Spark, and we try to avoid any changes to such code to make sure we're not diverging from Spark

When extractPartitionValuesFromPartitionPath is turned on, the StructType schema and AvroSchema differs. convertToAvroSchema is missing the default value when the field is nullable, making the table not queryable.

I don't think i understand why you believe this is an appropriate fix for the issue you're observing:

Spark's schemas don't have defaults at all

In case Avro schema's field is nullable doesn't entail that it should have null as default value

From what i understand so far the issue is not in the conversion, but in the fact that we're not handling schema evolution properly in HoodieAvroDataBlock -- whenever we decode a record from an existing data block we should make sure that any nullable field has actually null as default value so that Avro reader is able to decode the data in case this particular field is not present

cc @xiarixiaoyao

I agree with @alexeykudinkin ， we should not change the code of SchemaConverters.scala, this is the bug of logscanner

xiarixiaoyao · 2023-02-10T03:41:06Z

@qidian99
thanks for your contribution
I ran this UT directly in the master branch， expected to fail but finally succeeded
could you pls check your UT thanks

Thanks for the timely reply. I changed the UT to manually set partition pruning to true.

@stream2000 and I both tested on master branch and the test will fail

@qidian99 can you please paste the whole stacktrace? Would like to understand better what exactly is failing

I see you pasted the stacktrace failing when you query your data via server. Can you please paste the stacktrace of this particular test failing?

I want to better understand which operation is failing in this test

@qidian99 only non-partitioned tables has this problem？

-Original file line number
+Diff line change
@@ Expand Up / @@ -20,6 +20,7 @@ package org.apache.spark.sql.avro @@
     import org.apache.avro.LogicalTypes.{Date, Decimal, TimestampMicros, TimestampMillis}
     import org.apache.avro.Schema.Type._
     import org.apache.avro.{LogicalTypes, Schema, SchemaBuilder}
+    import org.apache.hudi.avro.AvroSchemaUtils.isNullable
     import org.apache.spark.annotation.DeveloperApi
     import org.apache.spark.sql.types.Decimal.minBytesForPrecision
     import org.apache.spark.sql.types._
@@ Expand Down Expand Up / @@ -202,7 +203,12 @@ private[sql] object SchemaConverters { @@
               st.foreach { f =>
                 val fieldAvroType =
                   toAvroType(f.dataType, f.nullable, f.name, childNameSpace)
-                fieldsAssembler.name(f.name).`type`(fieldAvroType).noDefault()
+                val fieldBuilder = fieldsAssembler.name(f.name).`type`(fieldAvroType)
+                if (isNullable(fieldAvroType)) {
+                  fieldBuilder.withDefault(null)
+                } else {
+                  fieldBuilder.noDefault()
+                }
               }
               fieldsAssembler.endRecord()
             }
@@ Expand All / @@ -212,7 +218,7 @@ private[sql] object SchemaConverters { @@
         }
         if (nullable && catalystType != NullType && schema.getType != Schema.Type.UNION) {
-          Schema.createUnion(schema, nullSchema)
+          Schema.createUnion(nullSchema, schema)
         } else {
           schema
         }
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -204,4 +204,52 @@ class TestUpdateTable extends HoodieSparkSqlTestBase { @@
           }
         })
       }
+      test("Test Add Column and Update Table") {
+        withTempDir { tmp =>
+          val tableName = generateTableName
+          spark.sql("SET hoodie.datasource.read.extract.partition.values.from.path=true")
+          // create table
+          spark.sql(
+            s"""
+               |create table $tableName (
+               |  id int,
+               |  name string,
+               |  price double,
+               |  ts long
+               |) using hudi
+               | location '${tmp.getCanonicalPath}/$tableName'
+               | tblproperties (
+               |  type = 'mor',
+               |  primaryKey = 'id',
+               |  preCombineField = 'ts'
+               | )
+         """.stripMargin)
+          // insert data to table
+          spark.sql(s"insert into $tableName select 1, 'a1', 10, 1000")
+          checkAnswer(s"select id, name, price, ts from $tableName")(
+            Seq(1, "a1", 10.0, 1000)
+          )
+          spark.sql(s"update $tableName set price = 22 where id = 1")
+          checkAnswer(s"select id, name, price, ts from $tableName")(
+            Seq(1, "a1", 22.0, 1000)
+          )
+          spark.sql(s"alter table $tableName add column new_col1 int")
+          checkAnswer(s"select id, name, price, ts, new_col1 from $tableName")(
+            Seq(1, "a1", 22.0, 1000, null)
+          )
+          // update and check
+          spark.sql(s"update $tableName set price = price * 2 where id = 1")
+          checkAnswer(s"select id, name, price, ts, new_col1 from $tableName")(
+            Seq(1, "a1", 44.0, 1000, null)
+          )
+        }
+      }
     }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-5759] Supports add column on mor table with log #7915

Uh oh!

Diff view

Diff view

There are no files selected for viewing

alexeykudinkin Feb 10, 2023

Uh oh!

qidian99 Feb 13, 2023

Uh oh!

alexeykudinkin Feb 13, 2023

Uh oh!

alexeykudinkin Feb 13, 2023

Uh oh!

alexeykudinkin Feb 13, 2023

Uh oh!

xiarixiaoyao Feb 14, 2023

Uh oh!

xiarixiaoyao Feb 10, 2023 •

edited

Loading

Uh oh!

qidian99 Feb 10, 2023

Uh oh!

alexeykudinkin Feb 10, 2023

Uh oh!

alexeykudinkin Feb 13, 2023

Uh oh!

xiarixiaoyao Feb 14, 2023

Uh oh!

[HUDI-5759] Supports add column on mor table with log #7915

Uh oh!

[HUDI-5759] Supports add column on mor table with log #7915

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao Feb 10, 2023 •

edited

Loading