-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18465] Add 'IF EXISTS' clause to 'UNCACHE' to not throw exceptions when table doesn't exist #15896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #68684 has finished for PR 15896 at commit
|
|
cc @gatorsmile I changed a test you added. Do you have any strong feelings on this? |
|
Test build #68685 has finished for PR 15896 at commit
|
|
After the style is fixed, LGTM pending testing. |
|
Test build #68725 has finished for PR 15896 at commit
|
|
Test build #68726 has finished for PR 15896 at commit
|
|
Test build #68730 has finished for PR 15896 at commit
|
| try { | ||
| sparkSession.sharedState.cacheManager.uncacheQuery(query = sparkSession.table(tableName)) | ||
| } catch { | ||
| case _: NoSuchTableException => // do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR #15682, we identified an issue when we try to uncache a view that containing a table that has been dropped. We are not issuing NoSuchTableException. Instead. Instead, we issue an AnalysisException. That means, this might not cover all the senario. Do you want to cover that case in this PR?
|
@gatorsmile Talking offline with several people, I may put this PR on hold for now since it is a behavior change. I guess it would be better to go with Options 1 or 2 that I defined in the PR description. |
|
@brkyvz Why not supporting both? : ) Each has its own usage scenario. |
|
Hi, @brkyvz and @gatorsmile . |
|
On hold on my side. Will try to get back to it On Nov 17, 2016 3:31 PM, "Dongjoon Hyun" [email protected] wrote:
|
|
I see. Thank you for informing that, @brkyvz |
|
I personally think |
|
Hey @andrewor14 ! I went with your way. @hvanhovell can you take a quick look please? I would really like this to be available in Spark 2.1 (even though it is a new API) |
| sparkSession.catalog.uncacheTable(tableId) | ||
| } catch { | ||
| case _: NoSuchTableException if ifExists => // don't throw | ||
| logInfo(s"Asked to uncache table $tableId which doesn't exist.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Lets not log it, logs typically get swallowed by the environment anyway. See for example: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L206
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment, otherwise LGTM
|
@hvanhovell Done! Thanks for the quick review! |
|
LGTM pending jenkins |
|
When the target is a view, it becomes a little bit more complex. If this PR does not handle it, we can do it in a separate PR. UNCACHE TABLE IF EXISTS viewName; |
|
@gatorsmile could you elaborate a little? What am I missing? |
|
Test build #69013 has finished for PR 15896 at commit
|
|
Test build #69014 has finished for PR 15896 at commit
|
|
Merging to master/2.1. Thanks! |
…ions when table doesn't exist ## What changes were proposed in this pull request? While this behavior is debatable, consider the following use case: ```sql UNCACHE TABLE foo; CACHE TABLE foo AS SELECT * FROM bar ``` The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it. The issue is that subsequent `CACHE TABLE` commands do not overwrite the existing table. Now we can do: ```sql UNCACHE TABLE IF EXISTS foo; CACHE TABLE foo AS SELECT * FROM bar ``` ## How was this patch tested? Unit tests Author: Burak Yavuz <[email protected]> Closes #15896 from brkyvz/uncache. (cherry picked from commit bdc8153) Signed-off-by: Herman van Hovell <[email protected]>
|
Test build #69017 has finished for PR 15896 at commit
|
|
Test build #69021 has finished for PR 15896 at commit
|
spark.range(1, 10).toDF("id1").write.format("json").saveAsTable("jt1")
spark.range(1, 10).toDF("id2").write.format("json").saveAsTable("jt2")
sql("CREATE VIEW testView AS SELECT * FROM jt1 JOIN jt2 ON id1 == id2")
// Cache is empty at the beginning
assert(spark.sharedState.cacheManager.isEmpty)
sql("CACHE TABLE testView")
assert(spark.catalog.isCached("testView"))
// Cache is not empty
assert(!spark.sharedState.cacheManager.isEmpty)
// drop a table referenced by a cached view
sql("DROP TABLE jt1")
-- So far everything is fine
// Failed to unache the view
val e = intercept[AnalysisException] {
sql("UNCACHE TABLE testView")
}.getMessage
assert(e.contains("Table or view not found: `default`.`jt1`"))
// We are unable to drop it from the cache
assert(!spark.sharedState.cacheManager.isEmpty)@hvanhovell Above is the example. |
|
Thanks! Isn't this an existing (separate) problem, that we should fix? |
|
Yeah! That is just a relavant problem. Found it when I reviewed another PR: #15682 |
|
I agree that it is relevant, and that we should fix it for 2.1. |
…ions when table doesn't exist ## What changes were proposed in this pull request? While this behavior is debatable, consider the following use case: ```sql UNCACHE TABLE foo; CACHE TABLE foo AS SELECT * FROM bar ``` The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it. The issue is that subsequent `CACHE TABLE` commands do not overwrite the existing table. Now we can do: ```sql UNCACHE TABLE IF EXISTS foo; CACHE TABLE foo AS SELECT * FROM bar ``` ## How was this patch tested? Unit tests Author: Burak Yavuz <[email protected]> Closes apache#15896 from brkyvz/uncache.
…ions when table doesn't exist ## What changes were proposed in this pull request? While this behavior is debatable, consider the following use case: ```sql UNCACHE TABLE foo; CACHE TABLE foo AS SELECT * FROM bar ``` The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it. The issue is that subsequent `CACHE TABLE` commands do not overwrite the existing table. Now we can do: ```sql UNCACHE TABLE IF EXISTS foo; CACHE TABLE foo AS SELECT * FROM bar ``` ## How was this patch tested? Unit tests Author: Burak Yavuz <[email protected]> Closes apache#15896 from brkyvz/uncache.
What changes were proposed in this pull request?
While this behavior is debatable, consider the following use case:
The command above fails the first time you run it. But I want to run the command above over and over again, and I don't want to change my code just for the first run of it.
The issue is that subsequent
CACHE TABLEcommands do not overwrite the existing table.Now we can do:
How was this patch tested?
Unit tests