Skip to content

Conversation

@imback82
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to migrate UNCACHE TABLE to use UnresolvedRelation to resolve the table/view identifier in Analyzer as discussed https://github.com/apache/spark/pull/30403/files#r532360022.

Why are the changes needed?

To resolve the table/view in the analyzer.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Updated existing tests

@github-actions github-actions bot added the SQL label Dec 12, 2020
@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37289/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37289/

@SparkQA
Copy link

SparkQA commented Dec 12, 2020

Test build #132686 has finished for PR 30743 at commit 2acf503.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UncacheTable(
  • case class UncacheTableExec(

@imback82
Copy link
Contributor Author

cc @cloud-fan thanks in advance!

override def run(): Seq[InternalRow] = {
val sparkSession = sqlContext.sparkSession
val df = Dataset.ofRows(sparkSession, relation)
sparkSession.sharedState.cacheManager.uncacheQuery(df, cascade)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncacheQuery can take LogicalPlan directly. Let's use that overload to avoid creating a dataframe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do the same to cacheQuery, but we need to add a new overload that takes LogicalPlan first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to pass a logical plan instead of dataframe (this required updating more rules, but I think it's "more correct".)

I will add a new overload that takes LogicalPlan in a separate PR.


case UncacheTable(u: UnresolvedRelation, _, _) =>
failAnalysis(
s"Table or view not found for `UNCACHE TABLE`: ${u.multipartIdentifier.quoted}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second look, I think it's better to be consistent with INSERT and just say Table or view not found: xxx. When people run the command, they definitely know which command triggers the table not found issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will revert other commands as well in a separate PR.

@cloud-fan
Copy link
Contributor

We can probably have a base trait for commands that don't want to optimize its child plan, so that we can remove duplicated code. This can be done in a followup.

@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37458/

@cloud-fan
Copy link
Contributor

GA passed, merging to master!

@cloud-fan cloud-fan closed this in 62be248 Dec 16, 2020
@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37458/

@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Test build #132856 has finished for PR 30743 at commit cbfb192.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132856/

cloud-fan pushed a commit that referenced this pull request Dec 16, 2020
…hen a relation is not resolved

### What changes were proposed in this pull request?

Based on the discussion #30743 (comment), this PR proposes to remove the command name in AnalysisException message when a relation is not resolved.

For some of the commands that use `UnresolvedTable`, `UnresolvedView`, and `UnresolvedTableOrView` to resolve an identifier, when the identifier cannot be resolved, the exception will be something like `Table or view not found for 'SHOW TBLPROPERTIES': badtable`. The command name (`SHOW TBLPROPERTIES` in this case) should be dropped to be consistent with other existing commands.

### Why are the changes needed?

To make the exception message consistent.

### Does this PR introduce _any_ user-facing change?

Yes, the exception message will be changed from
```
Table or view not found for 'SHOW TBLPROPERTIES': badtable
```
to
```
Table or view not found: badtable
```
for commands that use `UnresolvedTable`, `UnresolvedView`, and `UnresolvedTableOrView` to resolve an identifier.

### How was this patch tested?

Updated existing tests.

Closes #30794 from imback82/remove_cmd_from_exception_msg.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Dec 18, 2020
…ry to avoid creating a dataframe

### What changes were proposed in this pull request?

This PR proposes to update `CACHE TABLE` to use a `LogicalPlan` when caching a query to avoid creating a `DataFrame` as suggested here: #30743 (comment)

For reference, `UNCACHE TABLE` also uses `LogicalPlan`: https://github.com/apache/spark/blob/0c129001201ccb63ae96f576b6f354da84024fb3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala#L91-L98

### Why are the changes needed?

To avoid creating an unnecessary dataframe and make it consistent with `uncacheQuery` used in `UNCACHE TABLE`.

### Does this PR introduce _any_ user-facing change?

No, just internal changes.

### How was this patch tested?

Existing tests since this is an internal refactoring change.

Closes #30815 from imback82/cache_with_logical_plan.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants