-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39828][SQL] Catalog.listTables should respect currentCatalog
#37241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // This class is instantiated by Spark, so `initialize` method will not be called. | ||
| override def initialize(name: String, options: CaseInsensitiveStringMap): Unit = {} | ||
|
|
||
| override def listTables(namespace: Array[String]): Array[Identifier] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no listViews in TableCatalog interface so I think it should be in listTables. IIUC existing v1 session catalog does this already (list views when calling listTables)
|
Thank you for pinging me, @amaliujia . |
Catalog.listTables should respect currentCatalog
|
cc @imback82 too |
| Seq(row.getString(1)) | ||
| } else { | ||
| ident ++ Seq(row.getString(1)) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, is this a regression due to one of the recent commits like SPARK-39236?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPARK-39236 updated listTables(dbName). This PR does not cause regression on that JIRA.
This is more like a side effect of SPARK-39506. Because in SPARK-39506 we support setCurrentCatalog and get currentCatalog, now for listTables it has a choice of which catalog to search for tables. In the past it always go to the only catalog which is spark_catalog, but now that catalog can be changed.
listDatabases() was already updated to respect the current catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is hard to define whether this is a regression (I would rather say it is a side effect that given we introduced a way to control current catalog). I think at least it still maintains backwards compatibility. For old users who do not need set current catalog, it will still be the one that they would target to (spark_catalog). The existing UT has tested that.
And then for new users, their set current catalog will be respected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detail. Yes, it's hard to say always during extending the existing semantics. New features are always nice to have, but what I hope is to keep the original features safe and independent as much as possible . As long as the old code works, we are good. Thank you again for all your efforts, @amaliujia .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding deprecations is also the best way until we are sure that the new features are manure enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed on that there should be a period of time to have new features mature enough with good adoptions before talking about deprecations.
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala
Outdated
Show resolved
Hide resolved
| val tables = ret | ||
| .map(row => ident ++ Seq(row.getString(1))) | ||
| .map(row => | ||
| // for views, their namespace are empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's only true for temp views?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHOW TABLES outputs a isTemporary column and we can check that directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to parse that isTemporary which is the third string in the row. Neither row.getString(2).toInt or row.getString(2).toBoolean does not work. I dig into the codebase and seems essentially the RowSerializer uses Unsafe.putBoolean.
Do you know what is the right way to parse a serialized boolean? Should I use a RowDeserializer somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, for a temp view named my_temp_table, this is the internal row in-memory value: [0,2000000000,200000000d,1,5f706d65745f796d,656c626174]
toCatalystRow(table.namespace().quoted, table.name(), isTempView(table)). The isTempView is the third value, which is 200000000d.
Also I actually don't know why there are five values in the row....
|
Can one of the admins verify this patch? |
|
This PR is already covered by #37287. I will close my PR then. |
What changes were proposed in this pull request?
Catalog.listTables()should respect current catalog now as we have introduced that concept in 3.4.0.ShowTablesv2 command does not list views.Why are the changes needed?
To make
Catalog.listTables()should respect current catalog if that is used.Does this PR introduce any user-facing change?
No. Existing users without caring about catalog name in qualified identifiers will remain the same behavior. This is tested by existing unit tests on
listTables.How was this patch tested?
UT