[SPARK-33765][SQL] Migrate UNCACHE TABLE to use UnresolvedRelation to resolve identifier #30743

imback82 · 2020-12-12T03:02:27Z

What changes were proposed in this pull request?

This PR proposes to migrate UNCACHE TABLE to use UnresolvedRelation to resolve the table/view identifier in Analyzer as discussed https://github.com/apache/spark/pull/30403/files#r532360022.

Why are the changes needed?

To resolve the table/view in the analyzer.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Updated existing tests

SparkQA · 2020-12-12T03:42:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37289/

SparkQA · 2020-12-12T04:09:51Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37289/

SparkQA · 2020-12-12T07:28:29Z

Test build #132686 has finished for PR 30743 at commit 2acf503.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UncacheTable(
case class UncacheTableExec(

imback82 · 2020-12-12T08:09:16Z

cc @cloud-fan thanks in advance!

cloud-fan · 2020-12-15T07:57:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala

+  override def run(): Seq[InternalRow] = {
+    val sparkSession = sqlContext.sparkSession
+    val df = Dataset.ofRows(sparkSession, relation)
+    sparkSession.sharedState.cacheManager.uncacheQuery(df, cascade)


uncacheQuery can take LogicalPlan directly. Let's use that overload to avoid creating a dataframe.

We can do the same to cacheQuery, but we need to add a new overload that takes LogicalPlan first.

I updated to pass a logical plan instead of dataframe (this required updating more rules, but I think it's "more correct".)

I will add a new overload that takes LogicalPlan in a separate PR.

cloud-fan · 2020-12-15T08:00:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala


+      case UncacheTable(u: UnresolvedRelation, _, _) =>
+        failAnalysis(
+          s"Table or view not found for `UNCACHE TABLE`: ${u.multipartIdentifier.quoted}")


After a second look, I think it's better to be consistent with INSERT and just say Table or view not found: xxx. When people run the command, they definitely know which command triggers the table not found issue.

OK. I will revert other commands as well in a separate PR.

cloud-fan · 2020-12-16T04:59:47Z

We can probably have a base trait for commands that don't want to optimize its child plan, so that we can remove duplicated code. This can be done in a followup.

SparkQA · 2020-12-16T05:17:11Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37458/

cloud-fan · 2020-12-16T05:37:34Z

GA passed, merging to master!

SparkQA · 2020-12-16T05:50:42Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37458/

SparkQA · 2020-12-16T07:22:20Z

Test build #132856 has finished for PR 30743 at commit cbfb192.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2020-12-16T07:23:30Z

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132856/

…hen a relation is not resolved ### What changes were proposed in this pull request? Based on the discussion #30743 (comment), this PR proposes to remove the command name in AnalysisException message when a relation is not resolved. For some of the commands that use `UnresolvedTable`, `UnresolvedView`, and `UnresolvedTableOrView` to resolve an identifier, when the identifier cannot be resolved, the exception will be something like `Table or view not found for 'SHOW TBLPROPERTIES': badtable`. The command name (`SHOW TBLPROPERTIES` in this case) should be dropped to be consistent with other existing commands. ### Why are the changes needed? To make the exception message consistent. ### Does this PR introduce _any_ user-facing change? Yes, the exception message will be changed from ``` Table or view not found for 'SHOW TBLPROPERTIES': badtable ``` to ``` Table or view not found: badtable ``` for commands that use `UnresolvedTable`, `UnresolvedView`, and `UnresolvedTableOrView` to resolve an identifier. ### How was this patch tested? Updated existing tests. Closes #30794 from imback82/remove_cmd_from_exception_msg. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ry to avoid creating a dataframe ### What changes were proposed in this pull request? This PR proposes to update `CACHE TABLE` to use a `LogicalPlan` when caching a query to avoid creating a `DataFrame` as suggested here: #30743 (comment) For reference, `UNCACHE TABLE` also uses `LogicalPlan`: https://github.com/apache/spark/blob/0c129001201ccb63ae96f576b6f354da84024fb3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala#L91-L98 ### Why are the changes needed? To avoid creating an unnecessary dataframe and make it consistent with `uncacheQuery` used in `UNCACHE TABLE`. ### Does this PR introduce _any_ user-facing change? No, just internal changes. ### How was this patch tested? Existing tests since this is an internal refactoring change. Closes #30815 from imback82/cache_with_logical_plan. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

initial commit

2acf503

github-actions bot added the SQL label Dec 12, 2020

cloud-fan reviewed Dec 15, 2020

View reviewed changes

imback82 added 2 commits December 15, 2020 16:35

Merge remote-tracking branch 'upstream/master' into uncache_v2

4cc3418

Address PR comments

cbfb192

cloud-fan approved these changes Dec 16, 2020

View reviewed changes

cloud-fan closed this in 62be248 Dec 16, 2020

imback82 mentioned this pull request Dec 16, 2020

[SPARK-33800][SQL] Remove command name in AnalysisException message when a relation is not resolved #30794

Closed

imback82 mentioned this pull request Dec 17, 2020

[SPARK-33817][SQL] CACHE TABLE uses a logical plan when caching a query to avoid creating a dataframe #30815

Closed

[SPARK-33765][SQL] Migrate UNCACHE TABLE to use UnresolvedRelation to resolve identifier #30743

[SPARK-33765][SQL] Migrate UNCACHE TABLE to use UnresolvedRelation to resolve identifier #30743

Uh oh!

Conversation

imback82 commented Dec 12, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Dec 12, 2020

Uh oh!

SparkQA commented Dec 12, 2020

Uh oh!

SparkQA commented Dec 12, 2020

Uh oh!

imback82 commented Dec 12, 2020

Uh oh!

cloud-fan Dec 15, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 15, 2020

Choose a reason for hiding this comment

Uh oh!

imback82 Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 15, 2020

Choose a reason for hiding this comment

Uh oh!

imback82 Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 16, 2020

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

cloud-fan commented Dec 16, 2020

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

AmplabJenkins commented Dec 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants