[SPARK-43203][SQL] Move all Drop Table case to DataSource V2 #41348

Hisoka-X · 2023-05-28T12:14:26Z

What changes were proposed in this pull request?

In order to fix DROP table behavior in session catalog cause by #37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs.
So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180

Why are the changes needed?

Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested by:

V2 table catalog tests: org.apache.spark.sql.execution.command.v2.DropTableSuite
V1 table catalog tests: org.apache.spark.sql.execution.command.v1.DropTableSuiteBase

Hisoka-X · 2023-05-28T12:15:09Z

cc @aokolnychyi @cloud-fan @viirya

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/IdentifierImpl.java

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

Hisoka-X · 2023-05-29T00:59:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala

        isSameName(ident.qualifier :+ ident.name) &&
          isSameName(v1Ident.catalog.toSeq ++ v1Ident.database :+ v1Ident.table)

+      case SubqueryAlias(ident, HiveTableRelation(catalogTable, _, _, _, _)) =>


Add support for HiveTableRelation in isMatchedTableOrView, so that can remove Cache normally when use hive catalog.

Hisoka-X · 2023-05-29T01:01:48Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala

+    try {
+      V1Table(catalog.getTableMetadata(ident.asTableIdentifier))
+    } catch {
+      case _: NoSuchDatabaseException =>


loadTable will return NoSuchDatabaseException, not same as v1 behavior.

does it cause test failures?

Yes, the DerbyTableCatalogSuite:SPARK-42978: RENAME cannot qualify a new-table-Name with a schema-Name. will failed without this change.

sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/IdentifierImpl.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/DropTableSuite.scala

viirya · 2023-06-08T20:29:43Z

Is the test failure related?

DataFrameFunctionsSuite.DataFrame function and SQL functon parity
org.scalatest.exceptions.TestFailedException: Set("ceiling", "negative", "std", "sign") was not empty

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/IdentifierImpl.java

Hisoka-X · 2023-06-09T00:14:03Z

Is the test failure related?

DataFrameFunctionsSuite.DataFrame function and SQL functon parity
org.scalatest.exceptions.TestFailedException: Set("ceiling", "negative", "std", "sign") was not empty

I think not, many PR can't pass sql - other tests at now. Seem ci have something wrong.

Hisoka-X · 2023-06-11T03:52:37Z

Could you re-trigger CI?

Done, please wait.

Hisoka-X · 2023-06-13T00:58:26Z

@cloud-fan @viirya Hi folks, the CI passed. Can we move forward next step?

cloud-fan · 2023-06-19T12:41:55Z

thanks, merging to master!

Hisoka-X · 2023-06-19T12:44:44Z

Thanks @cloud-fan @viirya

aokolnychyi · 2023-06-27T22:37:44Z

I unfortunately created this initially as improvement. It is actually a bug and regression, which breaks DROP in custom sessions catalogs. Can we include it in 3.4.2?

viirya · 2023-06-27T23:10:33Z

I'm okay with changing it to bug and backport to 3.4.2. cc @cloud-fan

### What changes were proposed in this pull request? In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180  ### Why are the changes needed? Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1.  ### Does this PR introduce _any_ user-facing change? No  ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase`  Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)

… null table ### What changes were proposed in this pull request? This is a followup of #41348 . Previously `V2SessionCatalog.dropTable` treated null table as table not exists, but #41348 broke it. This PR fixes it. ### Why are the changes needed? to keep old behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #42056 from cloud-fan/mm. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Kent Yao <[email protected]>

… null table ### What changes were proposed in this pull request? This is a followup of #41348 . Previously `V2SessionCatalog.dropTable` treated null table as table not exists, but #41348 broke it. This PR fixes it. ### Why are the changes needed? to keep old behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #42056 from cloud-fan/mm. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 704131b) Signed-off-by: Kent Yao <[email protected]>

… null table ### What changes were proposed in this pull request? This is a followup of apache#41348 . Previously `V2SessionCatalog.dropTable` treated null table as table not exists, but apache#41348 broke it. This PR fixes it. ### Why are the changes needed? to keep old behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#42056 from cloud-fan/mm. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 704131b)

### What changes were proposed in this pull request? In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180  ### Why are the changes needed? Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1.  ### Does this PR introduce _any_ user-facing change? No  ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase`  Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)

… null table ### What changes were proposed in this pull request? This is a followup of apache#41348 . Previously `V2SessionCatalog.dropTable` treated null table as table not exists, but apache#41348 broke it. This PR fixes it. ### Why are the changes needed? to keep old behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#42056 from cloud-fan/mm. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 704131b)

### What changes were proposed in this pull request? cherry pick #41348 and #42056 , this a bug fixed should be included in 3.4.2 ### Why are the changes needed? Fix DROP table behavior in session catalog ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase` Closes #41765 from Hisoka-X/move_drop_table_v2_to_3.4.2. Lead-authored-by: Jia Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180  Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1.  No  Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase`  Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)

… null table ### What changes were proposed in this pull request? This is a followup of apache#41348 . Previously `V2SessionCatalog.dropTable` treated null table as table not exists, but apache#41348 broke it. This PR fixes it. ### Why are the changes needed? to keep old behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#42056 from cloud-fan/mm. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Kent Yao <[email protected]>

gaecoli · 2024-08-20T03:07:33Z

I found if you create a management table in spark 3.5.1, you cannot delete the path when dropping the table. I think this is a bug. It can be deleted correctly in spark 3.3.

gaecoli · 2024-08-20T03:07:56Z

I found if you create a management table in spark 3.5.1, you cannot delete the path when dropping the table. I think this is a bug. It can be deleted correctly in spark 3.3.

@cloud-fan

cloud-fan · 2024-08-20T03:15:10Z

@gaecoli thanks for reporting the bug! Can you create a JIRA ticket and include the reproduce steps?

gaecoli · 2024-08-20T06:53:20Z

@gaecoli thanks for reporting the bug! Can you create a JIRA ticket and include the reproduce steps?

i don't have permission to connect to spark jira

cloud-fan · 2024-08-25T04:41:39Z

or you can post the problem description and repro here and I can help to create the ticket.

github-actions bot added the SQL label May 28, 2023

[SPARK-43203][SQL] Move all Drop Table case to V2

bfd7c2f

Hisoka-X force-pushed the SPARK-43203_drop_table_to_v2 branch from 39268a4 to bfd7c2f Compare May 28, 2023 16:44

Hisoka-X commented May 29, 2023

View reviewed changes

cloud-fan reviewed May 31, 2023

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 31, 2023

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala Show resolved Hide resolved

Hisoka-X added 5 commits May 31, 2023 20:12

move IdentifierImpl to internal

5d0677d

fix test

6e1628c

fix test

e237366

fix header

719eb22

change hashcode

9927414

cloud-fan reviewed Jun 1, 2023

View reviewed changes

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala Outdated Show resolved Hide resolved

Hisoka-X added 3 commits June 1, 2023 18:04

fix review

bfed49e

Merge branch 'master_' into SPARK-43203_drop_table_to_v2

85164da

Merge branch 'master' into SPARK-43203_drop_table_to_v2

c477802

Hisoka-X requested a review from cloud-fan June 6, 2023 12:45

cloud-fan approved these changes Jun 7, 2023

View reviewed changes

move invalidateCachedTable to V2SessionCatalog

2a77746

cloud-fan reviewed Jun 7, 2023

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala Outdated Show resolved Hide resolved

fix review

b9e5488

viirya reviewed Jun 8, 2023

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/IdentifierImpl.scala Outdated Show resolved Hide resolved

viirya reviewed Jun 8, 2023

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala Outdated Show resolved Hide resolved

viirya reviewed Jun 8, 2023

View reviewed changes

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/DropTableSuite.scala Show resolved Hide resolved

Hisoka-X added 2 commits June 8, 2023 18:33

revert IdentifierImpl

9ed2051

remove useless code

93c632d

viirya reviewed Jun 8, 2023

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/IdentifierImpl.java Outdated Show resolved Hide resolved

remove public

8282f87

cloud-fan approved these changes Jun 19, 2023

View reviewed changes

cloud-fan closed this in 32a5db4 Jun 19, 2023

Hisoka-X deleted the SPARK-43203_drop_table_to_v2 branch June 19, 2023 12:44

Hisoka-X mentioned this pull request Jun 28, 2023

[SPARK-43203][SQL][3.4] Move all Drop Table case to DataSource V2 #41765

Closed

cloud-fan mentioned this pull request Jul 18, 2023

[SPARK-43203][SQL][FOLLOWUP] V2SessionCatalog.dropTable should handle null table #42056

Closed

[SPARK-43203][SQL] Move all Drop Table case to DataSource V2 #41348

[SPARK-43203][SQL] Move all Drop Table case to DataSource V2 #41348

Uh oh!

Conversation

Hisoka-X commented May 28, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Hisoka-X commented May 28, 2023

Uh oh!

Uh oh!

Uh oh!

Hisoka-X May 29, 2023

Choose a reason for hiding this comment

Uh oh!

Hisoka-X May 29, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jun 1, 2023

Choose a reason for hiding this comment

Uh oh!

Hisoka-X Jun 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viirya commented Jun 8, 2023

Uh oh!

Uh oh!

Hisoka-X commented Jun 9, 2023

Uh oh!

Hisoka-X commented Jun 11, 2023

Uh oh!

Hisoka-X commented Jun 13, 2023

Uh oh!

cloud-fan commented Jun 19, 2023

Uh oh!

Hisoka-X commented Jun 19, 2023

Uh oh!

aokolnychyi commented Jun 27, 2023

Uh oh!

viirya commented Jun 27, 2023

Uh oh!

gaecoli commented Aug 20, 2024

Uh oh!

gaecoli commented Aug 20, 2024

Uh oh!

cloud-fan commented Aug 20, 2024

Uh oh!

gaecoli commented Aug 20, 2024

Uh oh!

cloud-fan commented Aug 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants