-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40425][SQL] DROP TABLE does not need to do table lookup #37879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f696bdc to
42b82bf
Compare
| blocking: Boolean): Unit = { | ||
| val shouldRemove: LogicalPlan => Boolean = | ||
| if (cascade) { | ||
| _.exists(_.sameResult(plan)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sameResult doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm
| case SubqueryAlias(ident, DataSourceV2Relation(_, _, Some(catalog), Some(v2Ident), _)) => | ||
| isSameName(ident.qualifier :+ ident.name) && | ||
| isSameName(catalog.name() +: v2Ident.namespace() :+ v2Ident.name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SubqueryAlias have same name as the underlying relation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, see ResolveRelations.createRelation
| } | ||
| } | ||
|
|
||
| case class DropTempViewCommand(ident: Identifier) extends LeafRunnableCommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1 only, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temp view is a Spark internal thing and is unrelated to data source, so it's neither v1 nor v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I read the comment "v1 DROP TABLE supports temp view." wrongly. There is also other pattern statement for v2 going to DropTempViewCommand.
| // A fake v2 catalog to hold temp views. | ||
| object FakeSystemCatalog extends CatalogPlugin { | ||
| override def initialize(name: String, options: CaseInsensitiveStringMap): Unit = {} | ||
| override def name(): String = "SYSTEM" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FAKE_SYSTEM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name doesn't matter. We won't show it or look it up for now. But later I think it's a good idea to add a system catalog officially, to host temp view, temp functions and builtin functions.
| // no-op | ||
| } else { | ||
| throw QueryCompilationErrors.tableOrViewNotFoundError(tableName.identifier) | ||
| throw QueryCompilationErrors.noSuchTableError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DropTableCommand now won't be used to drop temp view, right? If so, there is some logic around val isTempView = catalog.isTempView(tableName), do we need update it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, we can simplify v1 DROP TABLE command now
|
One pyspark error, although looks like a real failure, seems unrelated? |
| assert(exception.getErrorClass === errorClass) | ||
| val mainErrorClass :: tail = errorClass.split("\\.").toList | ||
| assert(tail.isEmpty || tail.length == 1) | ||
| // TODO: remove the `errorSubClass` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nit. If we use IDed TODO with JIRA id, some contributor can pick up the item more easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't create a JIRA for this TODO because @MaxGekk will fix it shortly (we talked offline) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
cc @sunchao, too.
|
thanks for review, merging to master! |
| ResolvedIdentifier(catalog, identifier) | ||
| case UnresolvedIdentifier(nameParts, allowTemp) => | ||
| if (allowTemp && catalogManager.v1SessionCatalog.isTempView(nameParts)) { | ||
| val ident = Identifier.of(nameParts.dropRight(1).toArray, nameParts.last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nameParts.init is the counterpart to nameParts.last:
https://www.scala-lang.org/api/2.12.5/scala/collection/Seq.html#inits:Iterator[Repr]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea!
| val cmd = "DROP VIEW" | ||
| val hint = Some("Please use DROP TABLE instead.") | ||
| parseCompare(s"DROP VIEW testcat.db.view", | ||
| DropView(UnresolvedView(Seq("testcat", "db", "view"), cmd, true, hint), ifExists = false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does UnresolvedView even continue to exist, if it's not useful for dropping? Do we still use it for add/select/etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still used by commands like SetViewProperties
| DropTempViewCommand(ident) | ||
| } else { | ||
| throw QueryCompilationErrors.catalogOperationNotSupported(catalog, "views") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: r not used? And if you make FakeSystemCatalog a case object it can participate directly in matching here:
case DropView(ResolvedIdentifier(FakeSystemCatalog, ident), _) =>
DropTempViewCommand(ident)
case DropView(ResolvedIdentifier(catalog, _), _) =>
throw ...### What changes were proposed in this pull request? This PR updates `DropTable`/`DropView` to use `UnresolvedIdentifier` instead of `UnresolvedTableOrView`/`UnresolvedView`. This has several benefits: 1. Simplify the `ifExits` handling. No need to handle `DropTable` in `ResolveCommandsWithIfExists` anymore. 2. Avoid one table lookup if we eventually fallback to v1 command (v1 `DropTableCommand` will look up table again) 3. v2 catalogs can avoid table lookup entirely if possible. This PR also improves table uncaching to match by table name directly, so that we don't need to look up the table and resolve to table relations. ### Why are the changes needed? Save table lookup. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#37879 from cloud-fan/drop-table. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
| } | ||
|
|
||
| case DropTable(ResolvedV1TableIdentifier(ident), ifExists, purge) => | ||
| case DropTable(ResolvedV1Identifier(ident), ifExists, purge) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid this breaks the session catalog delegation. Previously, we checked the table was V1Table. Right now, we simply check the identifier looks like a V1 table identifier, which still may point to a valid V2 table. If I have a custom session catalog, it may be able to load both V1 and V2 tables. After this change, the V1 drop code is invoked for V2 tables in custom session catalogs. That means I can't drop tables correctly in custom session catalogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan @viirya @dongjoon-hyun, could you double check if I missed anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the difference between ResolvedV1TableIdentifier and ResolvedV1Identifier. So do you mean ResolvedV1Identifier could wrongly apply on a V2 table? I.e.,
case ResolvedIdentifier(catalog, ident) if isSessionCatalog(catalog)
If catalog is a custom session catalog which is capable for V1 and V2 tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw for many commands, there is a isV2Provider check, but DropTable doesn't. So seems we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ResolvedV1Identifier simply means it is an identifier in the session catalog that has only db and table name (in other words it is a valid V1 identifier). In custom session catalogs, it may point to a valid V2 table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we switch to V2 DROP path for all cases to fix SPARK-43203?
Yea we should. Can you create a PR? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me to switch to V2 DROP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have time, probably, on Monday. I'll do that then unless someone gets there earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aokolnychyi Any update for this? If you don't mind I can finish it this weekend.😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you create a management table in spark 3.5.1, you cannot delete the path when dropping the table. I think this is a bug. It can be deleted correctly in spark 3.3.
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html 2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'. 4. Be sure to keep the PR description updated to reflect all changes. 5. Please write your PR title to summarize what this PR proposes. 6. If possible, provide a concise example to reproduce the issue for a faster review. 7. If you want to add a new configuration, please read the guideline first for naming configurations in 'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'. 8. If you want to add or modify an error type or message, please read the guideline first in 'core/src/main/resources/error/README.md'. --> ### What changes were proposed in this pull request? In order to fix DROP table behavior in session catalog cause by #37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180 <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below. 1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers. 3. If you fix some SQL features, you can provide some references of other DBMSes. 4. If there is design documentation, please add the link. 5. If there is a discussion in the mailing list, please add the link. --> ### Why are the changes needed? Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1. <!-- Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. --> ### Does this PR introduce _any_ user-facing change? No <!-- Note that it means *any* user-facing change including all aspects such as the documentation fix. If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master. If no, write 'No'. --> ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase` <!-- If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks. --> Closes #41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html 2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'. 4. Be sure to keep the PR description updated to reflect all changes. 5. Please write your PR title to summarize what this PR proposes. 6. If possible, provide a concise example to reproduce the issue for a faster review. 7. If you want to add a new configuration, please read the guideline first for naming configurations in 'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'. 8. If you want to add or modify an error type or message, please read the guideline first in 'core/src/main/resources/error/README.md'. --> ### What changes were proposed in this pull request? In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180 <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below. 1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers. 3. If you fix some SQL features, you can provide some references of other DBMSes. 4. If there is design documentation, please add the link. 5. If there is a discussion in the mailing list, please add the link. --> ### Why are the changes needed? Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1. <!-- Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. --> ### Does this PR introduce _any_ user-facing change? No <!-- Note that it means *any* user-facing change including all aspects such as the documentation fix. If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master. If no, write 'No'. --> ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase` <!-- If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks. --> Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html 2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'. 4. Be sure to keep the PR description updated to reflect all changes. 5. Please write your PR title to summarize what this PR proposes. 6. If possible, provide a concise example to reproduce the issue for a faster review. 7. If you want to add a new configuration, please read the guideline first for naming configurations in 'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'. 8. If you want to add or modify an error type or message, please read the guideline first in 'core/src/main/resources/error/README.md'. --> ### What changes were proposed in this pull request? In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180 <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below. 1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers. 3. If you fix some SQL features, you can provide some references of other DBMSes. 4. If there is design documentation, please add the link. 5. If there is a discussion in the mailing list, please add the link. --> ### Why are the changes needed? Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1. <!-- Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. --> ### Does this PR introduce _any_ user-facing change? No <!-- Note that it means *any* user-facing change including all aspects such as the documentation fix. If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master. If no, write 'No'. --> ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase` <!-- If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks. --> Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html 2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'. 4. Be sure to keep the PR description updated to reflect all changes. 5. Please write your PR title to summarize what this PR proposes. 6. If possible, provide a concise example to reproduce the issue for a faster review. 7. If you want to add a new configuration, please read the guideline first for naming configurations in 'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'. 8. If you want to add or modify an error type or message, please read the guideline first in 'core/src/main/resources/error/README.md'. --> In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180 <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below. 1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers. 3. If you fix some SQL features, you can provide some references of other DBMSes. 4. If there is design documentation, please add the link. 5. If there is a discussion in the mailing list, please add the link. --> Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1. <!-- Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. --> No <!-- Note that it means *any* user-facing change including all aspects such as the documentation fix. If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master. If no, write 'No'. --> Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase` <!-- If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks. --> Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)
What changes were proposed in this pull request?
This PR updates
DropTable/DropViewto useUnresolvedIdentifierinstead ofUnresolvedTableOrView/UnresolvedView. This has several benefits:ifExitshandling. No need to handleDropTableinResolveCommandsWithIfExistsanymore.DropTableCommandwill look up table again)This PR also improves table uncaching to match by table name directly, so that we don't need to look up the table and resolve to table relations.
Why are the changes needed?
Save table lookup.
Does this PR introduce any user-facing change?
No
How was this patch tested?
existing tests