Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Jul 20, 2022

What changes were proposed in this pull request?

  1. Catalog.listTables() should respect current catalog now as we have introduced that concept in 3.4.0.
  2. During the development I realized that ShowTables v2 command does not list views.

Why are the changes needed?

To make Catalog.listTables() should respect current catalog if that is used.

Does this PR introduce any user-facing change?

No. Existing users without caring about catalog name in qualified identifiers will remain the same behavior. This is tested by existing unit tests on listTables.

How was this patch tested?

UT

@github-actions github-actions bot added the SQL label Jul 20, 2022
@amaliujia
Copy link
Contributor Author

R: @cloud-fan @dongjoon-hyun

// This class is instantiated by Spark, so `initialize` method will not be called.
override def initialize(name: String, options: CaseInsensitiveStringMap): Unit = {}

override def listTables(namespace: Array[String]): Array[Identifier] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no listViews in TableCatalog interface so I think it should be in listTables. IIUC existing v1 session catalog does this already (list views when calling listTables)

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @amaliujia .

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-39828] Catalog.listTables() should respect currentCatalog [SPARK-39828][SQL] Catalog.listTables should respect currentCatalog Jul 20, 2022
@dongjoon-hyun
Copy link
Member

cc @imback82 too

Seq(row.getString(1))
} else {
ident ++ Seq(row.getString(1))
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, is this a regression due to one of the recent commits like SPARK-39236?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-39236 updated listTables(dbName). This PR does not cause regression on that JIRA.

This is more like a side effect of SPARK-39506. Because in SPARK-39506 we support setCurrentCatalog and get currentCatalog, now for listTables it has a choice of which catalog to search for tables. In the past it always go to the only catalog which is spark_catalog, but now that catalog can be changed.

listDatabases() was already updated to respect the current catalog.

Copy link
Contributor Author

@amaliujia amaliujia Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is hard to define whether this is a regression (I would rather say it is a side effect that given we introduced a way to control current catalog). I think at least it still maintains backwards compatibility. For old users who do not need set current catalog, it will still be the one that they would target to (spark_catalog). The existing UT has tested that.

And then for new users, their set current catalog will be respected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detail. Yes, it's hard to say always during extending the existing semantics. New features are always nice to have, but what I hope is to keep the original features safe and independent as much as possible . As long as the old code works, we are good. Thank you again for all your efforts, @amaliujia .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding deprecations is also the best way until we are sure that the new features are manure enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on that there should be a period of time to have new features mature enough with good adoptions before talking about deprecations.

val tables = ret
.map(row => ident ++ Seq(row.getString(1)))
.map(row =>
// for views, their namespace are empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's only true for temp views?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOW TABLES outputs a isTemporary column and we can check that directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to parse that isTemporary which is the third string in the row. Neither row.getString(2).toInt or row.getString(2).toBoolean does not work. I dig into the codebase and seems essentially the RowSerializer uses Unsafe.putBoolean.

Do you know what is the right way to parse a serialized boolean? Should I use a RowDeserializer somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, for a temp view named my_temp_table, this is the internal row in-memory value: [0,2000000000,200000000d,1,5f706d65745f796d,656c626174]

toCatalystRow(table.namespace().quoted, table.name(), isTempView(table)). The isTempView is the third value, which is 200000000d.

Also I actually don't know why there are five values in the row....

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@amaliujia
Copy link
Contributor Author

This PR is already covered by #37287. I will close my PR then.

@amaliujia amaliujia closed this Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants