Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Nov 16, 2016

What changes were proposed in this pull request?

While this behavior is debatable, consider the following use case:

UNCACHE TABLE foo;
CACHE TABLE foo AS 
SELECT * FROM bar

The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it.
The issue is that subsequent CACHE TABLE commands do not overwrite the existing table.

Now we can do:

UNCACHE TABLE IF EXISTS foo;
CACHE TABLE foo AS 
SELECT * FROM bar

How was this patch tested?

Unit tests

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68684 has finished for PR 15896 at commit b8a4791.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 16, 2016

cc @gatorsmile I changed a test you added. Do you have any strong feelings on this?

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68685 has finished for PR 15896 at commit 8c11e90.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented Nov 16, 2016

After the style is fixed, LGTM pending testing.

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68725 has finished for PR 15896 at commit 27acd46.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68726 has finished for PR 15896 at commit 98b22d5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68730 has finished for PR 15896 at commit 15f3e59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

try {
sparkSession.sharedState.cacheManager.uncacheQuery(query = sparkSession.table(tableName))
} catch {
case _: NoSuchTableException => // do nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR #15682, we identified an issue when we try to uncache a view that containing a table that has been dropped. We are not issuing NoSuchTableException. Instead. Instead, we issue an AnalysisException. That means, this might not cover all the senario. Do you want to cover that case in this PR?

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 16, 2016

@gatorsmile Talking offline with several people, I may put this PR on hold for now since it is a behavior change. I guess it would be better to go with Options 1 or 2 that I defined in the PR description.

@gatorsmile
Copy link
Member

gatorsmile commented Nov 16, 2016

@brkyvz Why not supporting both? : )

Each has its own usage scenario.

@dongjoon-hyun
Copy link
Member

Hi, @brkyvz and @gatorsmile .
Does this proceed to Option 1 or 2 for now? Or is this holding on for next month?

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 17, 2016

On hold on my side. Will try to get back to it

On Nov 17, 2016 3:31 PM, "Dongjoon Hyun" [email protected] wrote:

Hi, @brkyvz https://github.com/brkyvz and @gatorsmile
https://github.com/gatorsmile .
Does this proceed to Option 1 or 2 for now? Or is this holding on for next
month?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#15896 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFACe6aqzw1YO26KRXEZU0LG-tQLaAB7ks5q_OPHgaJpZM4KzRIL
.

@dongjoon-hyun
Copy link
Member

I see. Thank you for informing that, @brkyvz

@andrewor14
Copy link
Contributor

I personally think UNCACHE TABLE IF EXISTS is best. It preserves the old behavior but lets the user make sure a table is not cached if they really want.

@brkyvz brkyvz changed the title [SPARK-18465] Uncache table shouldn't throw an exception when table doesn't exist [SPARK-18465] Add 'IF EXISTS' clause to 'UNCACHE' to not throw exceptions when table doesn't exist Nov 22, 2016
@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 22, 2016

Hey @andrewor14 ! I went with your way. @hvanhovell can you take a quick look please? I would really like this to be available in Spark 2.1 (even though it is a new API)

sparkSession.catalog.uncacheTable(tableId)
} catch {
case _: NoSuchTableException if ifExists => // don't throw
logInfo(s"Asked to uncache table $tableId which doesn't exist.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Lets not log it, logs typically get swallowed by the environment anyway. See for example: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L206

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise LGTM

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 22, 2016

@hvanhovell Done! Thanks for the quick review!

@hvanhovell
Copy link
Contributor

LGTM pending jenkins

@gatorsmile
Copy link
Member

When the target is a view, it becomes a little bit more complex. If this PR does not handle it, we can do it in a separate PR.

UNCACHE TABLE IF EXISTS viewName;

@hvanhovell
Copy link
Contributor

@gatorsmile could you elaborate a little? What am I missing?

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69013 has finished for PR 15896 at commit 0307eb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69014 has finished for PR 15896 at commit 266b902.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Merging to master/2.1. Thanks!

asfgit pushed a commit that referenced this pull request Nov 22, 2016
…ions when table doesn't exist

## What changes were proposed in this pull request?

While this behavior is debatable, consider the following use case:
```sql
UNCACHE TABLE foo;
CACHE TABLE foo AS
SELECT * FROM bar
```
The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it.
The issue is that subsequent `CACHE TABLE` commands do not overwrite the existing table.

Now we can do:
```sql
UNCACHE TABLE IF EXISTS foo;
CACHE TABLE foo AS
SELECT * FROM bar
```

## How was this patch tested?

Unit tests

Author: Burak Yavuz <[email protected]>

Closes #15896 from brkyvz/uncache.

(cherry picked from commit bdc8153)
Signed-off-by: Herman van Hovell <[email protected]>
@asfgit asfgit closed this in bdc8153 Nov 22, 2016
@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69017 has finished for PR 15896 at commit 4d62ce4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69021 has finished for PR 15896 at commit 5432a83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

      spark.range(1, 10).toDF("id1").write.format("json").saveAsTable("jt1")
      spark.range(1, 10).toDF("id2").write.format("json").saveAsTable("jt2")
      sql("CREATE VIEW testView AS SELECT * FROM jt1 JOIN jt2 ON id1 == id2")
      // Cache is empty at the beginning
      assert(spark.sharedState.cacheManager.isEmpty)
      sql("CACHE TABLE testView")
      assert(spark.catalog.isCached("testView"))
      // Cache is not empty
      assert(!spark.sharedState.cacheManager.isEmpty)
      // drop a table referenced by a cached view
      sql("DROP TABLE jt1")

-- So far everything is fine

      // Failed to unache the view
      val e = intercept[AnalysisException] {
        sql("UNCACHE TABLE testView")
      }.getMessage
      assert(e.contains("Table or view not found: `default`.`jt1`"))

      // We are unable to drop it from the cache
      assert(!spark.sharedState.cacheManager.isEmpty)

@hvanhovell Above is the example.

@hvanhovell
Copy link
Contributor

Thanks! Isn't this an existing (separate) problem, that we should fix?

@gatorsmile
Copy link
Member

Yeah! That is just a relavant problem. Found it when I reviewed another PR: #15682

@hvanhovell
Copy link
Contributor

I agree that it is relevant, and that we should fix it for 2.1.

robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…ions when table doesn't exist

## What changes were proposed in this pull request?

While this behavior is debatable, consider the following use case:
```sql
UNCACHE TABLE foo;
CACHE TABLE foo AS
SELECT * FROM bar
```
The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it.
The issue is that subsequent `CACHE TABLE` commands do not overwrite the existing table.

Now we can do:
```sql
UNCACHE TABLE IF EXISTS foo;
CACHE TABLE foo AS
SELECT * FROM bar
```

## How was this patch tested?

Unit tests

Author: Burak Yavuz <[email protected]>

Closes apache#15896 from brkyvz/uncache.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ions when table doesn't exist

## What changes were proposed in this pull request?

While this behavior is debatable, consider the following use case:
```sql
UNCACHE TABLE foo;
CACHE TABLE foo AS
SELECT * FROM bar
```
The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it.
The issue is that subsequent `CACHE TABLE` commands do not overwrite the existing table.

Now we can do:
```sql
UNCACHE TABLE IF EXISTS foo;
CACHE TABLE foo AS
SELECT * FROM bar
```

## How was this patch tested?

Unit tests

Author: Burak Yavuz <[email protected]>

Closes apache#15896 from brkyvz/uncache.
@brkyvz brkyvz deleted the uncache branch February 3, 2019 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants