-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5759] Supports add column on mor table with log #7915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -204,4 +204,52 @@ class TestUpdateTable extends HoodieSparkSqlTestBase { | |
| } | ||
| }) | ||
| } | ||
|
|
||
| test("Test Add Column and Update Table") { | ||
| withTempDir { tmp => | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @qidian99
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the timely reply. I changed the UT to manually set partition pruning to true. @stream2000 and I both tested on master branch and the test will fail
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @qidian99 can you please paste the whole stacktrace? Would like to understand better what exactly is failing
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see you pasted the stacktrace failing when you query your data via server. Can you please paste the stacktrace of this particular test failing? I want to better understand which operation is failing in this test
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @qidian99 only non-partitioned tables has this problem? |
||
| val tableName = generateTableName | ||
|
|
||
| spark.sql("SET hoodie.datasource.read.extract.partition.values.from.path=true") | ||
|
|
||
| // create table | ||
| spark.sql( | ||
| s""" | ||
| |create table $tableName ( | ||
| | id int, | ||
| | name string, | ||
| | price double, | ||
| | ts long | ||
| |) using hudi | ||
| | location '${tmp.getCanonicalPath}/$tableName' | ||
| | tblproperties ( | ||
| | type = 'mor', | ||
| | primaryKey = 'id', | ||
| | preCombineField = 'ts' | ||
| | ) | ||
| """.stripMargin) | ||
|
|
||
| // insert data to table | ||
| spark.sql(s"insert into $tableName select 1, 'a1', 10, 1000") | ||
| checkAnswer(s"select id, name, price, ts from $tableName")( | ||
| Seq(1, "a1", 10.0, 1000) | ||
| ) | ||
|
|
||
| spark.sql(s"update $tableName set price = 22 where id = 1") | ||
| checkAnswer(s"select id, name, price, ts from $tableName")( | ||
| Seq(1, "a1", 22.0, 1000) | ||
| ) | ||
|
|
||
| spark.sql(s"alter table $tableName add column new_col1 int") | ||
|
|
||
| checkAnswer(s"select id, name, price, ts, new_col1 from $tableName")( | ||
| Seq(1, "a1", 22.0, 1000, null) | ||
| ) | ||
|
|
||
| // update and check | ||
| spark.sql(s"update $tableName set price = price * 2 where id = 1") | ||
| checkAnswer(s"select id, name, price, ts, new_col1 from $tableName")( | ||
| Seq(1, "a1", 44.0, 1000, null) | ||
| ) | ||
| } | ||
| } | ||
| } | ||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is actually borrowed from Spark, and we try to avoid any changes to such code to make sure we're not diverging from Spark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
extractPartitionValuesFromPartitionPathis turned on, the StructType schema and AvroSchema differs. convertToAvroSchema is missing the default value when the field is nullable, making the table not queryable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think i understand why you believe this is an appropriate fix for the issue you're observing:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what i understand so far the issue is not in the conversion, but in the fact that we're not handling schema evolution properly in
HoodieAvroDataBlock-- whenever we decode a record from an existing data block we should make sure that any nullable field has actually null as default value so that Avro reader is able to decode the data in case this particular field is not presentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @xiarixiaoyao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @alexeykudinkin , we should not change the code of SchemaConverters.scala, this is the bug of logscanner