-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54004][SQL] Fix uncaching table by name without cascading #52712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This was discovered and discussed in another PR. |
|
Ack. Thank you, @aokolnychyi . |
| } | ||
|
|
||
| plan match { | ||
| EliminateSubqueryAliases(plan) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we keep SubqueryAlias when putting the logical plan into cache data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative solution could be to call EliminateSubqueryAliases BEFORE putting the plan into the cache. This, however, will remove ALL subquery aliases... I was not sure about consequences, but I would be open to consider this option if everyone thinks it is safe.
Thoughts, @viirya @dongjoon-hyun @szehon-ho @cloud-fan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea we don't need SubqueryAlias in the cache keys, as during lookup, we call LogicalPlan#sameResult which strips the SubqueryAlias.
However, the cache key logical plans are exposed to custom normalization rules (See SparkSessionExtensions#injectPlanNormalizationRule), so seems safer to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it then, it is a fragile part of code.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
Although it looks irrelevant, please re-trigger the failed PySpark CI, @aokolnychyi . |
|
Retrying PySpark CI... |
|
Merged to master for Apache Spark 4.1.0-preview3. |
|
Thank you, @dongjoon-hyun @szehon-ho @viirya @gengliangwang @cloud-fan! |
### What changes were proposed in this pull request? This PR fixes uncaching table by name without cascading. ### Why are the changes needed? These changes are needed to invalidate data cache correctly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR comes with a test that previously failed. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52712 from aokolnychyi/spark-54004. Authored-by: Anton Okolnychyi <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR fixes uncaching table by name without cascading.
Why are the changes needed?
These changes are needed to invalidate data cache correctly.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This PR comes with a test that previously failed.
Was this patch authored or co-authored using generative AI tooling?
No.