[SPARK-40425][SQL] DROP TABLE does not need to do table lookup #37879

cloud-fan · 2022-09-14T14:48:41Z

What changes were proposed in this pull request?

This PR updates DropTable/DropView to use UnresolvedIdentifier instead of UnresolvedTableOrView/UnresolvedView. This has several benefits:

Simplify the ifExits handling. No need to handle DropTable in ResolveCommandsWithIfExists anymore.
Avoid one table lookup if we eventually fallback to v1 command (v1 DropTableCommand will look up table again)
v2 catalogs can avoid table lookup entirely if possible.

This PR also improves table uncaching to match by table name directly, so that we don't need to look up the table and resolve to table relations.

Why are the changes needed?

Save table lookup.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing tests

cloud-fan · 2022-09-15T08:59:16Z

cc @MaxGekk @viirya

viirya · 2022-09-15T21:33:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala

+      blocking: Boolean): Unit = {
    val shouldRemove: LogicalPlan => Boolean =
      if (cascade) {
-        _.exists(_.sameResult(plan))


sameResult doesn't work?

viirya · 2022-09-15T21:59:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala

+      case SubqueryAlias(ident, DataSourceV2Relation(_, _, Some(catalog), Some(v2Ident), _)) =>
+        isSameName(ident.qualifier :+ ident.name) &&
+          isSameName(catalog.name() +: v2Ident.namespace() :+ v2Ident.name())


Does SubqueryAlias have same name as the underlying relation?

yes, see ResolveRelations.createRelation

viirya · 2022-09-15T22:01:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+  }
+}
+
+case class DropTempViewCommand(ident: Identifier) extends LeafRunnableCommand {


v1 only, right?

temp view is a Spark internal thing and is unrelated to data source, so it's neither v1 nor v2.

Oh, I see. I read the comment "v1 DROP TABLE supports temp view." wrongly. There is also other pattern statement for v2 going to DropTempViewCommand.

viirya · 2022-09-15T22:03:00Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala

+// A fake v2 catalog to hold temp views.
+object FakeSystemCatalog extends CatalogPlugin {
+  override def initialize(name: String, options: CaseInsensitiveStringMap): Unit = {}
+  override def name(): String = "SYSTEM"


FAKE_SYSTEM?

the name doesn't matter. We won't show it or look it up for now. But later I think it's a good idea to add a system catalog officially, to host temp view, temp functions and builtin functions.

viirya · 2022-09-16T03:13:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

      // no-op
    } else {
-      throw QueryCompilationErrors.tableOrViewNotFoundError(tableName.identifier)
+      throw QueryCompilationErrors.noSuchTableError(


DropTableCommand now won't be used to drop temp view, right? If so, there is some logic around val isTempView = catalog.isTempView(tableName), do we need update it?

good point, we can simplify v1 DROP TABLE command now

viirya · 2022-09-16T16:32:46Z

One pyspark error, although looks like a real failure, seems unrelated?

 Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_spark_functions.py", line 28, in test_repeat
    self.assertTrue(spark_column_equals(SF.repeat(F.lit(1), 2), F.repeat(F.lit(1), 2)))
AssertionError: False is not true

dongjoon-hyun · 2022-09-17T05:47:17Z

core/src/test/scala/org/apache/spark/SparkFunSuite.scala

-    assert(exception.getErrorClass === errorClass)
+    val mainErrorClass :: tail = errorClass.split("\\.").toList
+    assert(tail.isEmpty || tail.length == 1)
+    // TODO: remove the `errorSubClass` parameter.


just nit. If we use IDed TODO with JIRA id, some contributor can pick up the item more easily.

I didn't create a JIRA for this TODO because @MaxGekk will fix it shortly (we talked offline) :)

dongjoon-hyun

+1, LGTM.

cc @sunchao, too.

cloud-fan · 2022-09-19T08:47:37Z

thanks for review, merging to master!

ryan-johnson-databricks · 2022-09-19T12:37:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala

-      ResolvedIdentifier(catalog, identifier)
+    case UnresolvedIdentifier(nameParts, allowTemp) =>
+      if (allowTemp && catalogManager.v1SessionCatalog.isTempView(nameParts)) {
+        val ident = Identifier.of(nameParts.dropRight(1).toArray, nameParts.last)


nit: nameParts.init is the counterpart to nameParts.last:
https://www.scala-lang.org/api/2.12.5/scala/collection/Seq.html#inits:Iterator[Repr]

ryan-johnson-databricks · 2022-09-19T12:40:21Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala

    val cmd = "DROP VIEW"
    val hint = Some("Please use DROP TABLE instead.")
    parseCompare(s"DROP VIEW testcat.db.view",
-      DropView(UnresolvedView(Seq("testcat", "db", "view"), cmd, true, hint), ifExists = false))


why does UnresolvedView even continue to exist, if it's not useful for dropping? Do we still use it for add/select/etc?

it's still used by commands like SetViewProperties

ryan-johnson-databricks · 2022-09-19T12:43:20Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

+        DropTempViewCommand(ident)
+      } else {
+        throw QueryCompilationErrors.catalogOperationNotSupported(catalog, "views")
+      }


nit: r not used? And if you make FakeSystemCatalog a case object it can participate directly in matching here:

case DropView(ResolvedIdentifier(FakeSystemCatalog, ident), _) => DropTempViewCommand(ident) case DropView(ResolvedIdentifier(catalog, _), _) => throw ...

### What changes were proposed in this pull request? This PR updates `DropTable`/`DropView` to use `UnresolvedIdentifier` instead of `UnresolvedTableOrView`/`UnresolvedView`. This has several benefits: 1. Simplify the `ifExits` handling. No need to handle `DropTable` in `ResolveCommandsWithIfExists` anymore. 2. Avoid one table lookup if we eventually fallback to v1 command (v1 `DropTableCommand` will look up table again) 3. v2 catalogs can avoid table lookup entirely if possible. This PR also improves table uncaching to match by table name directly, so that we don't need to look up the table and resolve to table relations. ### Why are the changes needed? Save table lookup. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#37879 from cloud-fan/drop-table. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

aokolnychyi · 2023-04-18T19:49:00Z

sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala

      }

-    case DropTable(ResolvedV1TableIdentifier(ident), ifExists, purge) =>
+    case DropTable(ResolvedV1Identifier(ident), ifExists, purge) =>


I am afraid this breaks the session catalog delegation. Previously, we checked the table was V1Table. Right now, we simply check the identifier looks like a V1 table identifier, which still may point to a valid V2 table. If I have a custom session catalog, it may be able to load both V1 and V2 tables. After this change, the V1 drop code is invoked for V2 tables in custom session catalogs. That means I can't drop tables correctly in custom session catalogs.

@cloud-fan @viirya @dongjoon-hyun, could you double check if I missed anything?

I checked the difference between ResolvedV1TableIdentifier and ResolvedV1Identifier. So do you mean ResolvedV1Identifier could wrongly apply on a V2 table? I.e.,

case ResolvedIdentifier(catalog, ident) if isSessionCatalog(catalog)

If catalog is a custom session catalog which is capable for V1 and V2 tables?

I saw for many commands, there is a isV2Provider check, but DropTable doesn't. So seems we need it?

I think ResolvedV1Identifier simply means it is an identifier in the session catalog that has only db and table name (in other words it is a valid V1 identifier). In custom session catalogs, it may point to a valid V2 table.

Shall we switch to V2 DROP path for all cases to fix SPARK-43203?

Yea we should. Can you create a PR? thanks!

Sounds good to me to switch to V2 DROP.

I'll have time, probably, on Monday. I'll do that then unless someone gets there earlier.

Hi @aokolnychyi Any update for this? If you don't mind I can finish it this weekend.😄

If you create a management table in spark 3.5.1, you cannot delete the path when dropping the table. I think this is a bug. It can be deleted correctly in spark 3.3.

### What changes were proposed in this pull request? In order to fix DROP table behavior in session catalog cause by #37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180  ### Why are the changes needed? Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1.  ### Does this PR introduce _any_ user-facing change? No  ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase`  Closes #41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180  ### Why are the changes needed? Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1.  ### Does this PR introduce _any_ user-facing change? No  ### How was this patch tested? Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase`  Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)

In order to fix DROP table behavior in session catalog cause by apache#37879. Because we always invoke V1 drop logic if the identifier looks like a V1 identifier. This is a big blocker for external data sources that provide custom session catalogs. So this PR move all Drop Table case to DataSource V2 (use drop table to drop view not include). More information please check https://github.com/apache/spark/pull/37879/files#r1170501180  Move Drop Table case to DataSource V2 to fix bug and prepare for remove drop table v1.  No  Tested by: - V2 table catalog tests: `org.apache.spark.sql.execution.command.v2.DropTableSuite` - V1 table catalog tests: `org.apache.spark.sql.execution.command.v1.DropTableSuiteBase`  Closes apache#41348 from Hisoka-X/SPARK-43203_drop_table_to_v2. Authored-by: Jia Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 32a5db4)

github-actions bot added the SQL label Sep 14, 2022

DROP TABLE does not need to do table lookup

42b82bf

cloud-fan force-pushed the drop-table branch from f696bdc to 42b82bf Compare September 15, 2022 06:16

github-actions bot added the CORE label Sep 15, 2022

viirya reviewed Sep 15, 2022

View reviewed changes

viirya reviewed Sep 16, 2022

View reviewed changes

address comments

4f6c4cb

viirya approved these changes Sep 16, 2022

View reviewed changes

dongjoon-hyun reviewed Sep 17, 2022

View reviewed changes

dongjoon-hyun approved these changes Sep 17, 2022

View reviewed changes

cloud-fan closed this in 1a13419 Sep 19, 2022

ryan-johnson-databricks reviewed Sep 19, 2022

View reviewed changes

aokolnychyi reviewed Apr 18, 2023

View reviewed changes

Hisoka-X mentioned this pull request May 28, 2023

[SPARK-43203][SQL] Move all Drop Table case to DataSource V2 #41348

Closed

[SPARK-40425][SQL] DROP TABLE does not need to do table lookup #37879

[SPARK-40425][SQL] DROP TABLE does not need to do table lookup #37879

Uh oh!

Conversation

cloud-fan commented Sep 14, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan commented Sep 15, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Sep 16, 2022

Uh oh!

dongjoon-hyun Sep 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 19, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cloud-fan Sep 16, 2022 •

edited

Loading

dongjoon-hyun Sep 17, 2022 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

aokolnychyi Apr 18, 2023 •

edited

Loading

aokolnychyi Apr 18, 2023 •

edited

Loading

aokolnychyi Apr 20, 2023 •

edited

Loading

Hisoka-X May 26, 2023 •

edited

Loading