-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33765][SQL] Migrate UNCACHE TABLE to use UnresolvedRelation to resolve identifier #30743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #132686 has finished for PR 30743 at commit
|
|
cc @cloud-fan thanks in advance! |
| override def run(): Seq[InternalRow] = { | ||
| val sparkSession = sqlContext.sparkSession | ||
| val df = Dataset.ofRows(sparkSession, relation) | ||
| sparkSession.sharedState.cacheManager.uncacheQuery(df, cascade) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uncacheQuery can take LogicalPlan directly. Let's use that overload to avoid creating a dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do the same to cacheQuery, but we need to add a new overload that takes LogicalPlan first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to pass a logical plan instead of dataframe (this required updating more rules, but I think it's "more correct".)
I will add a new overload that takes LogicalPlan in a separate PR.
|
|
||
| case UncacheTable(u: UnresolvedRelation, _, _) => | ||
| failAnalysis( | ||
| s"Table or view not found for `UNCACHE TABLE`: ${u.multipartIdentifier.quoted}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second look, I think it's better to be consistent with INSERT and just say Table or view not found: xxx. When people run the command, they definitely know which command triggers the table not found issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will revert other commands as well in a separate PR.
|
We can probably have a base trait for commands that don't want to optimize its child plan, so that we can remove duplicated code. This can be done in a followup. |
|
Kubernetes integration test starting |
|
GA passed, merging to master! |
|
Kubernetes integration test status failure |
|
Test build #132856 has finished for PR 30743 at commit
|
|
Refer to this link for build results (access rights to CI server needed): |
…hen a relation is not resolved ### What changes were proposed in this pull request? Based on the discussion #30743 (comment), this PR proposes to remove the command name in AnalysisException message when a relation is not resolved. For some of the commands that use `UnresolvedTable`, `UnresolvedView`, and `UnresolvedTableOrView` to resolve an identifier, when the identifier cannot be resolved, the exception will be something like `Table or view not found for 'SHOW TBLPROPERTIES': badtable`. The command name (`SHOW TBLPROPERTIES` in this case) should be dropped to be consistent with other existing commands. ### Why are the changes needed? To make the exception message consistent. ### Does this PR introduce _any_ user-facing change? Yes, the exception message will be changed from ``` Table or view not found for 'SHOW TBLPROPERTIES': badtable ``` to ``` Table or view not found: badtable ``` for commands that use `UnresolvedTable`, `UnresolvedView`, and `UnresolvedTableOrView` to resolve an identifier. ### How was this patch tested? Updated existing tests. Closes #30794 from imback82/remove_cmd_from_exception_msg. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ry to avoid creating a dataframe ### What changes were proposed in this pull request? This PR proposes to update `CACHE TABLE` to use a `LogicalPlan` when caching a query to avoid creating a `DataFrame` as suggested here: #30743 (comment) For reference, `UNCACHE TABLE` also uses `LogicalPlan`: https://github.com/apache/spark/blob/0c129001201ccb63ae96f576b6f354da84024fb3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala#L91-L98 ### Why are the changes needed? To avoid creating an unnecessary dataframe and make it consistent with `uncacheQuery` used in `UNCACHE TABLE`. ### Does this PR introduce _any_ user-facing change? No, just internal changes. ### How was this patch tested? Existing tests since this is an internal refactoring change. Closes #30815 from imback82/cache_with_logical_plan. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR proposes to migrate
UNCACHE TABLEto useUnresolvedRelationto resolve the table/view identifier in Analyzer as discussed https://github.com/apache/spark/pull/30403/files#r532360022.Why are the changes needed?
To resolve the table/view in the analyzer.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Updated existing tests