[SPARK-39828][SQL] `Catalog.listTables` should respect currentCatalog #37241

amaliujia · 2022-07-20T18:54:05Z

What changes were proposed in this pull request?

Catalog.listTables() should respect current catalog now as we have introduced that concept in 3.4.0.
During the development I realized that ShowTables v2 command does not list views.

Why are the changes needed?

To make Catalog.listTables() should respect current catalog if that is used.

Does this PR introduce any user-facing change?

No. Existing users without caring about catalog name in qualified identifiers will remain the same behavior. This is tested by existing unit tests on listTables.

How was this patch tested?

UT

amaliujia · 2022-07-20T18:54:52Z

R: @cloud-fan @dongjoon-hyun

amaliujia · 2022-07-20T18:56:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala

  // This class is instantiated by Spark, so `initialize` method will not be called.
  override def initialize(name: String, options: CaseInsensitiveStringMap): Unit = {}

  override def listTables(namespace: Array[String]): Array[Identifier] = {


There is no listViews in TableCatalog interface so I think it should be in listTables. IIUC existing v1 session catalog does this already (list views when calling listTables)

dongjoon-hyun · 2022-07-20T18:56:07Z

Thank you for pinging me, @amaliujia .

dongjoon-hyun · 2022-07-20T18:58:24Z

cc @imback82 too

dongjoon-hyun · 2022-07-20T19:01:14Z

sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala

+            Seq(row.getString(1))
+          } else {
+            ident ++ Seq(row.getString(1))
+          })


For my understanding, is this a regression due to one of the recent commits like SPARK-39236?

SPARK-39236 updated listTables(dbName). This PR does not cause regression on that JIRA.

This is more like a side effect of SPARK-39506. Because in SPARK-39506 we support setCurrentCatalog and get currentCatalog, now for listTables it has a choice of which catalog to search for tables. In the past it always go to the only catalog which is spark_catalog, but now that catalog can be changed.

listDatabases() was already updated to respect the current catalog.

Maybe it is hard to define whether this is a regression (I would rather say it is a side effect that given we introduced a way to control current catalog). I think at least it still maintains backwards compatibility. For old users who do not need set current catalog, it will still be the one that they would target to (spark_catalog). The existing UT has tested that.

And then for new users, their set current catalog will be respected.

Thanks for the detail. Yes, it's hard to say always during extending the existing semantics. New features are always nice to have, but what I hope is to keep the original features safe and independent as much as possible . As long as the old code works, we are good. Thank you again for all your efforts, @amaliujia .

Avoiding deprecations is also the best way until we are sure that the new features are manure enough.

agreed on that there should be a period of time to have new features mature enough with good adoptions before talking about deprecations.

sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala

cloud-fan · 2022-07-21T02:45:48Z

sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala

      val tables = ret
-        .map(row => ident ++ Seq(row.getString(1)))
+        .map(row =>
+          // for views, their namespace are empty


I think it's only true for temp views?

SHOW TABLES outputs a isTemporary column and we can check that directly.

I tried to parse that isTemporary which is the third string in the row. Neither row.getString(2).toInt or row.getString(2).toBoolean does not work. I dig into the codebase and seems essentially the RowSerializer uses Unsafe.putBoolean.

Do you know what is the right way to parse a serialized boolean? Should I use a RowDeserializer somehow?

For example, for a temp view named my_temp_table, this is the internal row in-memory value: [0,2000000000,200000000d,1,5f706d65745f796d,656c626174]

toCatalystRow(table.namespace().quoted, table.name(), isTempView(table)). The isTempView is the third value, which is 200000000d.

Also I actually don't know why there are five values in the row....

AmplabJenkins · 2022-07-21T19:13:38Z

Can one of the admins verify this patch?

amaliujia · 2022-08-02T20:53:05Z

This PR is already covered by #37287. I will close my PR then.

[SPARK-39828] Catalog.listTables() should respect currentCatalog.

de816f0

github-actions bot added the SQL label Jul 20, 2022

amaliujia commented Jul 20, 2022

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-39828] Catalog.listTables() should respect currentCatalog~~ [SPARK-39828][SQL] Catalog.listTables should respect currentCatalog Jul 20, 2022

dongjoon-hyun reviewed Jul 20, 2022

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala Outdated Show resolved Hide resolved

update

77237f9

cloud-fan reviewed Jul 21, 2022

View reviewed changes

amaliujia closed this Aug 2, 2022

[SPARK-39828][SQL] Catalog.listTables should respect currentCatalog #37241

[SPARK-39828][SQL] Catalog.listTables should respect currentCatalog #37241

Uh oh!

Conversation

amaliujia commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

amaliujia commented Jul 20, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 20, 2022

Uh oh!

dongjoon-hyun commented Jul 20, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amaliujia Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jul 21, 2022

Uh oh!

amaliujia commented Aug 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-39828][SQL] `Catalog.listTables` should respect currentCatalog #37241

[SPARK-39828][SQL] `Catalog.listTables` should respect currentCatalog #37241

amaliujia commented Jul 20, 2022 •

edited

Loading

amaliujia Jul 20, 2022 •

edited

Loading