[SPARK-54595][SQL] Keep existing behavior of MERGE INTO without SCHEMA EVOLUTION clause #53326

szehon-ho · 2025-12-04T20:06:55Z

What changes were proposed in this pull request?

Keep existing behavior for MERGE INTO without SCHEMA EVOLUTION clause for UPDATE SET * and INSERT * as well as UPDATE struct or INSERT struct, to throw exception if the source and target schemas are not exactly the same.

Why are the changes needed?

As @aokolnychyi tested this feature, he mentioned that as of Spark 4.1 the behavior is changed for MERGE INTO but without SCHEMA EVOLUTION clause.

In particular:

Source has less columns/nested fields than target => we fill with NULL or DEFAULT for inserts, and existing value for Update. (though we disabled for nested structs by default in [SPARK-54525)]([SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229)
Source has more columns/fields than target => we drop the extra fields.

Initially, I thought its a good improvement of MERGE INTO and is not related to SCHEMA EVOLUTION exactly because the schema is not altered. But Anton has a good point that it may be a surprise to some user. So it may be better for now to be more conservative and keep the exact same behavior for without SCHEMA EVOLUTION clause.

Note: this behavior is still enabled if SCHEMA EVOLUTION is specified, as the user then is more explicit about the decision.

Does this PR introduce any user-facing change?

No, this keeps behavior exactly the same as 4.0 without SCHEMA EVOLUTION clause.

How was this patch tested?

Added a test and changed existing test output to expect the exception if SCHEMA EVOLUTION is not specified.

Was this patch authored or co-authored using generative AI tooling?

No

…ause

szehon-ho · 2025-12-04T20:09:00Z

HI @dongjoon-hyun fyi, @aokolnychyi tested this MERGE INTO WITH SCHEMA EVOLUTION feature more and had this feedback. Will try to get consensus today, but preparing the pr just in case, its a simple one.

The rationale for this pr is that its safer to keep behavior without SCHEMA EVOLUTION clause (where Spark used to throw exception), and we can relax it later, rather than the other way around.

Note its the case without SCHEMA EVOLUTION clause, it doesnt change the behavior of the new feature (with the SCHEMA EVOLUTION clause)

dongjoon-hyun · 2025-12-04T20:26:27Z

Thank you for informing that, @szehon-ho .

dongjoon-hyun

Could you fix the failure, @szehon-ho ?

- MERGE INTO TABLE - primary *** FAILED *** (19 milliseconds)
updateAssigns.apply(0).key.asInstanceOf[org.apache.spark.sql.catalyst.expressions.AttributeReference].sameRef(ts) was

dongjoon-hyun · 2025-12-05T18:39:03Z

Gentle ping, @szehon-ho .

szehon-ho · 2025-12-05T22:35:51Z

ah let me take a look

szehon-ho · 2025-12-05T22:58:02Z

Fixed it, thanks!

dongjoon-hyun

+1, LGTM. Thank you, @szehon-ho .
Merged to master/4.1.

…A EVOLUTION clause ### What changes were proposed in this pull request? Keep existing behavior for MERGE INTO without SCHEMA EVOLUTION clause for UPDATE SET * and INSERT * as well as UPDATE struct or INSERT struct, to throw exception if the source and target schemas are not exactly the same. ### Why are the changes needed? As aokolnychyi tested this feature, he mentioned that as of Spark 4.1 the behavior is changed for MERGE INTO but without SCHEMA EVOLUTION clause. In particular: - Source has less columns/nested fields than target => we fill with NULL or DEFAULT for inserts, and existing value for Update. (though we disabled for nested structs by default in [[SPARK-54525](https://issues.apache.org/jira/browse/SPARK-54525))](https://github.com/apache/spark/pull/53229) - Source has more columns/fields than target => we drop the extra fields. Initially, I thought its a good improvement of MERGE INTO and is not related to SCHEMA EVOLUTION exactly because the schema is not altered. But Anton has a good point that it may be a surprise to some user. So it may be better for now to be more conservative and keep the exact same behavior for without SCHEMA EVOLUTION clause. Note: this behavior is still enabled if SCHEMA EVOLUTION is specified, as the user then is more explicit about the decision. ### Does this PR introduce _any_ user-facing change? No, this keeps behavior exactly the same as 4.0 without SCHEMA EVOLUTION clause. ### How was this patch tested? Added a test and changed existing test output to expect the exception if SCHEMA EVOLUTION is not specified. ### Was this patch authored or co-authored using generative AI tooling? No Closes #53326 from szehon-ho/merge_restriction. Authored-by: Szehon Ho <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 74b6a93) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2025-12-06T03:20:52Z

My bad. I only checked the previous one and missed another failure. Let me revert this PR.

- merge with struct missing nested field with check constraint *** FAILED ***
Expected exception org.apache.spark.SparkRuntimeException to be thrown, but org.apache.spark.sql.AnalysisException was thrown (MergeIntoTableSuiteBase.scala:5359)

dongjoon-hyun · 2025-12-06T03:22:22Z

This is reverted

master: 6bf8c17
branch-4.1: 22e5d88

dongjoon-hyun · 2025-12-06T03:24:10Z

Please ping me when you have a working PR once more, @szehon-ho .

szehon-ho · 2025-12-06T04:37:19Z

Ah, i think it conflicted with the concurrent pr. I will try to take a look

szehon-ho · 2025-12-06T05:33:28Z

Actually was able to fix the test quickly: #53363 thanks again @dongjoon-hyun !

[SPARK-54595][SQL] Keep existing behavior without SCHEMA EVOLUTION cl…

323ae36

…ause

github-actions bot added the SQL label Dec 4, 2025

szehon-ho changed the title ~~[SPARK-54595][SQL] Keep existing behavior without SCHEMA EVOLUTION clause~~ [SPARK-54595][SQL] Keep existing behavior of MERGE INTO without SCHEMA EVOLUTION clause Dec 4, 2025

dongjoon-hyun reviewed Dec 5, 2025

View reviewed changes

Fix test

1e078b7

dongjoon-hyun approved these changes Dec 6, 2025

View reviewed changes

dongjoon-hyun closed this in 74b6a93 Dec 6, 2025

szehon-ho mentioned this pull request Dec 6, 2025

[SPARK-54595][SQL][Follow-up] Keep existing behavior of MERGE INTO without SCHEMA EVOLUTION clause #53363

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54595][SQL] Keep existing behavior of MERGE INTO without SCHEMA EVOLUTION clause #53326

[SPARK-54595][SQL] Keep existing behavior of MERGE INTO without SCHEMA EVOLUTION clause #53326

szehon-ho commented Dec 4, 2025 •

edited

Loading

Uh oh!

szehon-ho commented Dec 4, 2025

Uh oh!

dongjoon-hyun commented Dec 4, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Dec 5, 2025

Uh oh!

szehon-ho commented Dec 5, 2025

Uh oh!

szehon-ho commented Dec 5, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

szehon-ho commented Dec 6, 2025 •

edited

Loading

Uh oh!

szehon-ho commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54595][SQL] Keep existing behavior of MERGE INTO without SCHEMA EVOLUTION clause #53326

[SPARK-54595][SQL] Keep existing behavior of MERGE INTO without SCHEMA EVOLUTION clause #53326

Conversation

szehon-ho commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

szehon-ho commented Dec 4, 2025

Uh oh!

dongjoon-hyun commented Dec 4, 2025

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 5, 2025

Uh oh!

szehon-ho commented Dec 5, 2025

Uh oh!

szehon-ho commented Dec 5, 2025

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

dongjoon-hyun commented Dec 6, 2025

Uh oh!

szehon-ho commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szehon-ho commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho commented Dec 4, 2025 •

edited

Loading

szehon-ho commented Dec 6, 2025 •

edited

Loading