[SPARK-54004][SQL] Fix uncaching table by name without cascading #52712

aokolnychyi · 2025-10-23T19:51:07Z

What changes were proposed in this pull request?

This PR fixes uncaching table by name without cascading.

Why are the changes needed?

These changes are needed to invalidate data cache correctly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This PR comes with a test that previously failed.

Was this patch authored or co-authored using generative AI tooling?

No.

aokolnychyi · 2025-10-23T19:51:51Z

cc @dongjoon-hyun @gengliangwang @cloud-fan @szehon-ho @huaxingao @viirya

aokolnychyi · 2025-10-23T19:52:34Z

This was discovered and discussed in another PR.

dongjoon-hyun · 2025-10-23T20:00:07Z

Ack. Thank you, @aokolnychyi .

viirya · 2025-10-23T22:52:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala

    }

-    plan match {
+    EliminateSubqueryAliases(plan) match {


I wonder why we keep SubqueryAlias when putting the logical plan into cache data.

An alternative solution could be to call EliminateSubqueryAliases BEFORE putting the plan into the cache. This, however, will remove ALL subquery aliases... I was not sure about consequences, but I would be open to consider this option if everyone thinks it is safe.

Thoughts, @viirya @dongjoon-hyun @szehon-ho @cloud-fan?

yea we don't need SubqueryAlias in the cache keys, as during lookup, we call LogicalPlan#sameResult which strips the SubqueryAlias.

However, the cache key logical plans are exposed to custom normalization rules (See SparkSessionExtensions#injectPlanNormalizationRule), so seems safer to keep it.

Let's keep it then, it is a fragile part of code.

dongjoon-hyun

+1, LGTM.

dongjoon-hyun · 2025-10-23T23:48:07Z

Although it looks irrelevant, please re-trigger the failed PySpark CI, @aokolnychyi .

aokolnychyi · 2025-10-24T03:48:52Z

Retrying PySpark CI...

dongjoon-hyun · 2025-10-24T05:06:42Z

Merged to master for Apache Spark 4.1.0-preview3.
Thank you, @aokolnychyi and all.

aokolnychyi · 2025-10-24T14:41:39Z

Thank you, @dongjoon-hyun @szehon-ho @viirya @gengliangwang @cloud-fan!

### What changes were proposed in this pull request? This PR fixes uncaching table by name without cascading. ### Why are the changes needed? These changes are needed to invalidate data cache correctly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR comes with a test that previously failed. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52712 from aokolnychyi/spark-54004. Authored-by: Anton Okolnychyi <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-54004] Fix uncaching table by name without cascading

4473620

github-actions bot added the SQL label Oct 23, 2025

aokolnychyi mentioned this pull request Oct 23, 2025

[SPARK-53732][SQL] Remember TimeTravelSpec in DataSourceV2Relation #52599

Closed

dongjoon-hyun changed the title ~~[SPARK-54004] Fix uncaching table by name without cascading~~ [SPARK-54004][SQL] Fix uncaching table by name without cascading Oct 23, 2025

szehon-ho approved these changes Oct 23, 2025

View reviewed changes

viirya reviewed Oct 23, 2025

View reviewed changes

gengliangwang approved these changes Oct 23, 2025

View reviewed changes

viirya approved these changes Oct 23, 2025

View reviewed changes

dongjoon-hyun approved these changes Oct 23, 2025

View reviewed changes

cloud-fan approved these changes Oct 24, 2025

View reviewed changes

dongjoon-hyun closed this in 2b2a2a2 Oct 24, 2025

[SPARK-54004][SQL] Fix uncaching table by name without cascading #52712

[SPARK-54004][SQL] Fix uncaching table by name without cascading #52712

Uh oh!

Conversation

aokolnychyi commented Oct 23, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

aokolnychyi commented Oct 23, 2025

Uh oh!

aokolnychyi commented Oct 23, 2025

Uh oh!

dongjoon-hyun commented Oct 23, 2025

Uh oh!

viirya Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 23, 2025

Uh oh!

aokolnychyi commented Oct 24, 2025

Uh oh!

dongjoon-hyun commented Oct 24, 2025

Uh oh!

aokolnychyi commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants