[SPARK-43742][SQL] Refactor default column value resolution #41262

cloud-fan · 2023-05-22T16:44:03Z

What changes were proposed in this pull request?

This PR refactors the default column value resolution so that we don't need an extra DS v2 API for external v2 sources. The general idea is to split the default column value resolution into two parts:

resolve the column "DEFAULT" to the column default expression. This applies to Project/UnresolvedInlineTable under InsertIntoStatement, and assignment expressions in UpdateTable/MergeIntoTable.
fill missing columns with column default values for the input query. This does not apply to UPDATE and non-INSERT action of MERGE as they use the column from the target table as the default value.

The first part should be done for all the data sources, as it's part of column resolution. The second part should not be applied to v2 data sources with ACCEPT_ANY_SCHEMA, as they are free to define how to handle missing columns.

More concretely, this PR:

put the column "DEFAULT" resolution logic in the rule ResolveReferences, with two new virtual rules. This is to follow [SPARK-41405][SQL] Centralize the column resolution logic #38888
put the missing column handling in TableOutputResolver, which is shared by both the v1 and v2 insertion resolution rule. External v2 data sources can add custom catalyst rules to deal with missing columns for themselves.
Remove the old rule ResolveDefaultColumns. Note that, with the refactor, we no long need to manually look up the table. We will deal with column default values after the target table of INSERT/UPDATE/MERGE is resolved.
Remove the rule ResolveUserSpecifiedColumns and merge it to PreprocessTableInsertion. These two rules are both to resolve v1 insertion, and it's tricky to reason about their interactions. It's clearer to resolve the insertion with one pass.

Why are the changes needed?

code cleanup and remove unneeded DS v2 API.

Does this PR introduce any user-facing change?

No

How was this patch tested?

updated tests

gengliangwang · 2023-05-23T02:47:30Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInInsert.scala

nit: plan is always InsertIntoStatement from the caller side. I also noticed that the input of ResolveReferencesInUpdate is UpdateTable. Shall we make them consistent?

We will add v2 write commands later. I'll add a TODO here.

gengliangwang · 2023-05-23T02:49:37Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInInsert.scala

ResolveColumnDefaultInInsert?

gengliangwang · 2023-05-23T03:02:42Z

LGTM, pending on tests.
Awesome refactoring!

cloud-fan · 2023-05-23T07:48:42Z

sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala

Missing columns should not fail and we test it in ResolveDefaultColumnsSuite

cloud-fan · 2023-05-23T07:49:29Z

sql/core/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumnsSuite.scala

MERGE/UPDATE are tested in Align[Update|Merge]AssignmentsSuite

dongjoon-hyun · 2023-05-23T08:52:00Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

If you don't mind, please file a JIRA and use the IDed TODO here.

cloud-fan · 2023-05-23T16:04:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-  object ResolveInsertInto extends Rule[LogicalPlan] {
-
-    /** Add a project to use the table column names for INSERT INTO BY NAME */
-    private def createProjectForByNameQuery(i: InsertIntoStatement): LogicalPlan = {


the code here is unchanged but just moved to ResolveInsertionBase

cloud-fan · 2023-05-24T12:45:12Z

core/src/main/resources/error/error-classes.json

Unify the errors between v1 and v2 inserts.

cloud-fan · 2023-05-24T12:51:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

This change is needed. We want to resolve the table first, then resolve the column "DEFAULT" in the query. This means we can't wait for the query to be resolved before resolving the table.

sounds good, with any luck this can help reduce dependencies on rule orderings within the analyzer.

If we remove the pattern guard in this code, some operations on the "i.query" will fail later on. I create #44326 to fix

cloud-fan · 2023-05-24T14:27:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

It's fragile to resolve insert column list and static partitions separately. This PR resolves both in PreprocessTableInsertion for v1 insert. Spark already resolves both for v2 inserts in ResolveInsertInto.

cloud-fan · 2023-05-24T14:34:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala

This is an existing issue that the default column value doesn't work for v2 inserts. I decided to fix it later as it needs to update quite some v2 tests.

cloud-fan · 2023-05-24T14:37:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

This is needed due to https://github.com/apache/spark/pull/41262/files#r1204075426 . Now it's possible that table is resolved but the query is not.

cloud-fan · 2023-05-24T14:43:26Z

sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala

The error was changed by 9f0bf51 and now it's restored. I think it's more accurate to report column already exists error rather than inline table error.

cloud-fan · 2023-05-24T14:45:11Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlignMergeAssignmentsSuite.scala

We have tests to make sure missing columns will be filled with default values, e.g. https://github.com/apache/spark/pull/41262/files#diff-960688d2ad5179d1592810c50de1a163364c01c5f164bbded5d0d0dce05b39fdR859

cloud-fan · 2023-05-24T14:45:57Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala

now we create null literal with the expected data type directly.

cloud-fan · 2023-05-24T14:49:02Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala

sql8 is s"UPDATE testcat.defaultvalues SET i=DEFAULT, s=DEFAULT WHERE i=DEFAULT"

I think it's almost impossible to find out all the improper places for hosting the column "DEFAULT", e.g. how about the UPDATE/MERGE assignment key? Other operators like Sort? This PR only checks the nested column "DEFAULT" and fails. If the column "DEFAULT" appears in improper places, we won't resolve it and users will hit unresolved column error.

This sounds reasonable, we can leave this test here, but we don't have to exhaustively cover all the cases.

cloud-fan · 2023-05-24T14:49:49Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala

simplify the MERGE statement to focus on missing cols.

cloud-fan · 2023-05-24T14:50:23Z

sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala

it should be allowed. According to the tests the default value resolution is not triggered in some cases before this PR.

should we keep the test, but change its result to assert that it succeeds? or is this behavior exercised elsewhere in this test file?

cloud-fan · 2023-05-24T14:51:48Z

sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala

remove this conf setting as it's true by default.

cloud-fan · 2023-05-24T14:53:14Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

It's more accurate to report that the partition column list in INSERT does not match the actual table partition columns.

dtenedor

Thanks for doing this! Took one initial review pass.

dtenedor · 2023-05-24T19:16:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

sounds good, with any luck this can help reduce dependencies on rule orderings within the analyzer.

dtenedor · 2023-05-24T19:30:37Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

+  def apply(plan: LogicalPlan): LogicalPlan = plan match {
+    case i: InsertIntoStatement if conf.enableDefaultColumns && i.table.resolved &&
+        i.query.containsPattern(UNRESOLVED_ATTRIBUTE) =>
+      val staticPartCols = i.partitionSpec.filter(_._2.isDefined).keys.map(normalizeFieldName).toSet


this is a bit hard to read, can we split the transformations into different lines with vals, and use an explicit name instead of _2 to refer to the column?

InsertIntoStatement#partitionSpec is Map[String, Option[String]], and in Scala we can only use _2 to refer to the map value.

dtenedor · 2023-05-24T19:31:39Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

+    case i: InsertIntoStatement if conf.enableDefaultColumns && i.table.resolved &&
+        i.query.containsPattern(UNRESOLVED_ATTRIBUTE) =>
+      val staticPartCols = i.partitionSpec.filter(_._2.isDefined).keys.map(normalizeFieldName).toSet
+      val expectedQuerySchema = i.table.schema.filter { field =>


can we have a brief comment saying what this is?

dtenedor · 2023-05-25T00:27:59Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

+      case p: Project if acceptProject && p.child.resolved &&
+          p.containsPattern(UNRESOLVED_ATTRIBUTE) &&
+          p.projectList.length <= expectedQuerySchema.length =>
+        val newProjectList = p.projectList.zipWithIndex.map {


can we have some comment here describing the logic of adding new unresolved attributes referring to "DEFAULT" if the provided query has fewer columns than the target table, or else converting such existing unresolved attributes to their corresponding values?

I'll add doc for the resolveColumnDefault method.

dtenedor · 2023-05-25T00:28:50Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

+          exprs.zipWithIndex.map {
+            case (u: UnresolvedAttribute, i) if isExplicitDefaultColumn(u) =>
+              val field = expectedQuerySchema(i)
+              getDefaultValueExpr(field).getOrElse(Literal(null, field.dataType))


we could integrate the Literal(null) part into getDefaultValueExpr since we want to use the NULL value if the default metadata is not present in every case. Or is this getDefaultValueExprOrNullLiteral, which we can use instead?

There is a subtle difference: For missing cols, the default null value is optional (controlled by a flag). For the column "DEFAULT", it's a new feature when we add default value support and we can always use null as the default value if it's not defined.

optional: should we add a boolean argument to getDefaultValueExprOrNullLiteral to switch the behavior between the two modes?

dtenedor · 2023-05-25T00:33:15Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala

+          resolvedKey match {
+            case attr: AttributeReference if conf.enableDefaultColumns =>
+              resolved match {
+                case u: UnresolvedAttribute if isExplicitDefaultColumn(u) =>


same, let's add a comment here mentioning that we're looking for unresolved attribute references to "DEFAULT" and replacing them?

dtenedor

Thanks for doing this! Took one initial review pass.

dtenedor

I reviewed carefully through the whole PR again, the logic and testing looks good. For any tables with capability ACCEPT_ANY_SCHEMA, we will bypass all this logic and the rest of the work is up to custom logic for those tables. We might have to duplicate some of this if any of those tables want to support default column values. But that sounds fair given the intended meaning of this capability.

dtenedor · 2023-05-25T18:42:27Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

+          p.projectList.length <= expectedQuerySchema.length =>
+        val newProjectList = p.projectList.zipWithIndex.map {
+          case (u: UnresolvedAttribute, i) if isExplicitDefaultColumn(u) =>
+            val field = expectedQuerySchema(i)


optional: when I wrote the original ResolveDefaultColumns rule, I named this variable insertTableSchemaWithoutPartitionColumns because I found myself confused frequently when reading the variable name. We could name this insertTargetTableSchema to clarify this, or insertTargetTableSchemaWithoutPartitionColumns if you don't think that's too verbose.

tableSchema is not very accurate, and neither does insertTargetTableSchemaWithoutPartitionColumns. It's actually table schema excluding partition columns with static values.

That's why I choose expectedQuerySchema. People can read comments of the caller of this function to understand how we define the expected query schema.

dtenedor · 2023-05-25T18:44:52Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

+
+      case _: GlobalLimit | _: LocalLimit | _: Offset | _: Sort if acceptProject =>
+        plan.mapChildren(
+          resolveColumnDefault(_, expectedQuerySchema, acceptInlineTable = false))


optional: It looks like the only purpose of acceptInlineTable is setting it to false here in the event of a LIMIT and/or OFFSET and/or ORDER BY on top of a VALUES list. Do you think this check is strictly necessary? If not, we can simplify by removing acceptInlineTable as an argument to this function.

I don't think it's necessary but just want to keep the old behavior. Let me remove it.

dtenedor · 2023-05-25T18:46:36Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveColumnDefaultInInsert.scala

+          exprs.zipWithIndex.map {
+            case (u: UnresolvedAttribute, i) if isExplicitDefaultColumn(u) =>
+              val field = expectedQuerySchema(i)
+              getDefaultValueExpr(field).getOrElse(Literal(null, field.dataType))


optional: should we add a boolean argument to getDefaultValueExprOrNullLiteral to switch the behavior between the two modes?

dtenedor · 2023-05-25T18:49:31Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala

This sounds reasonable, we can leave this test here, but we don't have to exhaustively cover all the cases.

dtenedor · 2023-05-25T18:50:23Z

sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala

should we keep the test, but change its result to assert that it succeeds? or is this behavior exercised elsewhere in this test file?

dtenedor · 2023-05-25T18:51:46Z

sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala

        assert(intercept[AnalysisException] {
          sql("insert into t (i) values (default)")
-        }.getMessage.contains(addOneColButExpectedTwo))
+        }.getMessage.contains("Cannot find data for output column 's'"))


can we dedup this expected error message substring into one place, or even better, use checkError to assert on the error class?

dongjoon-hyun

+1, LGTM.

dongjoon-hyun · 2023-05-28T23:28:56Z

Merged to master for Apache Spark 3.5.0.

### What changes were proposed in this pull request? This PR refactors the default column value resolution so that we don't need an extra DS v2 API for external v2 sources. The general idea is to split the default column value resolution into two parts: 1. resolve the column "DEFAULT" to the column default expression. This applies to `Project`/`UnresolvedInlineTable` under `InsertIntoStatement`, and assignment expressions in `UpdateTable`/`MergeIntoTable`. 2. fill missing columns with column default values for the input query. This does not apply to UPDATE and non-INSERT action of MERGE as they use the column from the target table as the default value. The first part should be done for all the data sources, as it's part of column resolution. The second part should not be applied to v2 data sources with `ACCEPT_ANY_SCHEMA`, as they are free to define how to handle missing columns. More concretely, this PR: 1. put the column "DEFAULT" resolution logic in the rule `ResolveReferences`, with two new virtual rules. This is to follow apache#38888 2. put the missing column handling in `TableOutputResolver`, which is shared by both the v1 and v2 insertion resolution rule. External v2 data sources can add custom catalyst rules to deal with missing columns for themselves. 3. Remove the old rule `ResolveDefaultColumns`. Note that, with the refactor, we no long need to manually look up the table. We will deal with column default values after the target table of INSERT/UPDATE/MERGE is resolved. 4. Remove the rule `ResolveUserSpecifiedColumns` and merge it to `PreprocessTableInsertion`. These two rules are both to resolve v1 insertion, and it's tricky to reason about their interactions. It's clearer to resolve the insertion with one pass. ### Why are the changes needed? code cleanup and remove unneeded DS v2 API. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? updated tests Closes apache#41262 from cloud-fan/def-val. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

… for non-nullable columns ### What changes were proposed in this pull request? A followup of #41262 to fix a mistake. If a column has no default value and is not nullable, we should fail if people want to use its default value via the explicit `DEFAULT` name, and do not fill missing columns in INSERT. ### Why are the changes needed? fix a wrong behavior ### Does this PR introduce _any_ user-facing change? yes, otherwise the DML command will fail later at runtime. ### How was this patch tested? new tests Closes #41656 from cloud-fan/def-val. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? In the PR, I propose to raise an error when an user uses V1 `INSERT` without a list of columns, and the number of inserting columns doesn't match to the number of actual table columns. At the moment Spark inserts data successfully in such case after the PR #41262 which changed the behaviour of Spark 3.4.x. ### Why are the changes needed? 1. To conform the SQL standard which requires the number of columns must be the same: ![Screenshot 2023-08-07 at 11 01 27 AM](https://github.com/apache/spark/assets/1580697/c55badec-5716-490f-a83a-0bb6b22c84c7) Apparently, the insertion below must not succeed: ```sql spark-sql (default)> CREATE TABLE tabtest(c1 INT, c2 INT); spark-sql (default)> INSERT INTO tabtest SELECT 1; ``` 2. To have the same behaviour as **Spark 3.4**: ```sql spark-sql (default)> INSERT INTO tabtest SELECT 1; `spark_catalog`.`default`.`tabtest` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s). ``` ### Does this PR introduce _any_ user-facing change? Yes. After the changes: ```sql spark-sql (default)> INSERT INTO tabtest SELECT 1; [INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is not enough data columns: Table columns: `c1`, `c2`. Data columns: `1`. ``` ### How was this patch tested? By running the modified tests: ``` $ build/sbt "test:testOnly *InsertSuite" $ build/sbt "test:testOnly *ResolveDefaultColumnsSuite" $ build/sbt -Phive "test:testOnly *HiveQuerySuite" ``` Closes #42393 from MaxGekk/fix-num-cols-insert. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? In the PR, I propose to raise an error when an user uses V1 `INSERT` without a list of columns, and the number of inserting columns doesn't match to the number of actual table columns. At the moment Spark inserts data successfully in such case after the PR #41262 which changed the behaviour of Spark 3.4.x. ### Why are the changes needed? 1. To conform the SQL standard which requires the number of columns must be the same: ![Screenshot 2023-08-07 at 11 01 27 AM](https://github.com/apache/spark/assets/1580697/c55badec-5716-490f-a83a-0bb6b22c84c7) Apparently, the insertion below must not succeed: ```sql spark-sql (default)> CREATE TABLE tabtest(c1 INT, c2 INT); spark-sql (default)> INSERT INTO tabtest SELECT 1; ``` 2. To have the same behaviour as **Spark 3.4**: ```sql spark-sql (default)> INSERT INTO tabtest SELECT 1; `spark_catalog`.`default`.`tabtest` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s). ``` ### Does this PR introduce _any_ user-facing change? Yes. After the changes: ```sql spark-sql (default)> INSERT INTO tabtest SELECT 1; [INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is not enough data columns: Table columns: `c1`, `c2`. Data columns: `1`. ``` ### How was this patch tested? By running the modified tests: ``` $ build/sbt "test:testOnly *InsertSuite" $ build/sbt "test:testOnly *ResolveDefaultColumnsSuite" $ build/sbt -Phive "test:testOnly *HiveQuerySuite" ``` Closes #42393 from MaxGekk/fix-num-cols-insert. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit a7eef21) Signed-off-by: Max Gekk <[email protected]>

yaooqinn · 2024-07-03T09:10:01Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala

-        addError(s"Cannot find data for output column '${newColPath.quoted}'")
-        None
+        val defaultExpr = if (fillDefaultValue) {
+          getDefaultValueExprOrNullLit(expectedCol, conf)


This pollutes the expressions with unreplaced char/varchar and could result in bugs

… to a table with char/varchar ### What changes were proposed in this pull request? #41262 introduced a regression by applying literals with char/varchar type in query output for table insertions, see https://github.com/apache/spark/pull/41262/files#diff-6e331e8f1c67b5920fb46263b6e582ec6e6a253ee45543559c9692a72a1a40ecR187-R188 This causes bugs ```java 24/07/03 16:29:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.SparkException: [INTERNAL_ERROR] Unsupported data type VarcharType(64). SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:92) at org.apache.spark.SparkException$.internalError(SparkException.scala:96) ``` ```java org.apache.spark.SparkUnsupportedOperationException: VarcharType(64) is not supported yet. at org.apache.spark.sql.errors.QueryExecutionErrors$.dataTypeUnsupportedYetError(QueryExecutionErrors.scala:993) at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.newConverter(OrcSerializer.scala:209) at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.$anonfun$converters$2(OrcSerializer.scala:35) at scala.collection.immutable.List.map(List.scala:247) ``` ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #47198 from yaooqinn/SPARK-48792. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>

github-actions bot added the SQL label May 22, 2023

cloud-fan force-pushed the def-val branch from 0995651 to a702581 Compare May 23, 2023 01:04

gengliangwang reviewed May 23, 2023

View reviewed changes

cloud-fan force-pushed the def-val branch from a702581 to e6cc78f Compare May 23, 2023 07:45

github-actions bot added the CORE label May 23, 2023

cloud-fan commented May 23, 2023

View reviewed changes

cloud-fan changed the title ~~[WIP] refactor default column value resolution~~ [SPARK-43742][SQL] Refactor default column value resolution May 23, 2023

dongjoon-hyun reviewed May 23, 2023

View reviewed changes

refactor default column value resolution

be16454

cloud-fan force-pushed the def-val branch from e6cc78f to be16454 Compare May 23, 2023 09:20

fix

5c83feb

cloud-fan commented May 23, 2023

View reviewed changes

cloud-fan force-pushed the def-val branch from 609c8fa to 8227281 Compare May 24, 2023 11:07

cloud-fan commented May 24, 2023

View reviewed changes

fix tests

3544605

cloud-fan force-pushed the def-val branch from 8227281 to 3544605 Compare May 24, 2023 15:04

dtenedor reviewed May 25, 2023

View reviewed changes

address comments

a1a9cc8

dtenedor approved these changes May 25, 2023

View reviewed changes

cloud-fan added 2 commits May 26, 2023 14:15

address comments

e1d03ab

fix test

69c3ae2

dongjoon-hyun approved these changes May 28, 2023

View reviewed changes

dongjoon-hyun closed this in cc24978 May 28, 2023

cloud-fan mentioned this pull request Jun 19, 2023

[SPARK-43742][SQL][FOLLOWUP] Do not use null literal as default value for non-nullable columns #41656

Closed

MaxGekk mentioned this pull request Aug 29, 2023

[SPARK-43438][SQL] Error on missing input columns in INSERT #42393

Closed

yaooqinn reviewed Jul 3, 2024

View reviewed changes

yaooqinn mentioned this pull request Jul 3, 2024

[SPARK-48792][SQL] Fix regression for INSERT with partial column list to a table with char/varchar #47198

Closed

KnightChess mentioned this pull request Jul 6, 2024

[HUDI-7949] insert into hudi table with columns specified apache/hudi#11568

Merged

4 tasks

yihua mentioned this pull request Jan 23, 2025

[HUDI-8898] Support INSERT SQL statement with a subset of columns in Spark 3.5 apache/hudi#12692

Merged

4 tasks

hudi-bot mentioned this pull request Nov 30, 2025

Support INSERT SQL statement with a subset of columns in Spark 3.4 apache/hudi#16805

Open

huangxiaopingRD mentioned this pull request Dec 19, 2025

[SPARK-54771][SQL] Remove ResolveUserSpecifiedColumns rule from RuleIdCollection #53537

Open

[SPARK-43742][SQL] Refactor default column value resolution #41262

[SPARK-43742][SQL] Refactor default column value resolution #41262

Uh oh!

Conversation

cloud-fan commented May 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gengliangwang May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gengliangwang commented May 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dtenedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 22, 2023 •

edited

Loading

gengliangwang May 23, 2023 •

edited

Loading

cloud-fan May 24, 2023 •

edited

Loading

cloud-fan May 24, 2023 •

edited

Loading

cloud-fan May 24, 2023 •

edited

Loading

cloud-fan May 24, 2023 •

edited

Loading