Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,15 @@ class V2SessionCatalog(catalog: SessionCatalog)
override def listTables(namespace: Array[String]): Array[Identifier] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no listViews in TableCatalog interface so I think it should be in listTables. IIUC existing v1 session catalog does this already (list views when calling listTables)

namespace match {
case Array(db) =>
catalog
val tables = catalog
.listTables(db)
.map(ident => Identifier.of(ident.database.map(Array(_)).getOrElse(Array()), ident.table))
.toArray
val views = catalog
.listViews(db, "*")
.map(ident => Identifier.of(ident.database.map(Array(_)).getOrElse(Array()), ident.table))
.toArray
tables ++ views
case _ =>
throw QueryCompilationErrors.noSuchNamespaceError(namespace)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,11 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
}

/**
* Returns a list of tables in the current database.
* Returns a list of tables in the current catalog and current database.
* This includes all temporary tables.
*/
override def listTables(): Dataset[Table] = {
listTables(currentDatabase)
listTables(currentCatalog() ++ "." ++ currentDatabase)
}

/**
Expand All @@ -120,7 +120,13 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
val plan = ShowTables(UnresolvedNamespace(ident), None)
val ret = sparkSession.sessionState.executePlan(plan).toRdd.collect()
val tables = ret
.map(row => ident ++ Seq(row.getString(1)))
.map(row =>
// for views, their namespace are empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's only true for temp views?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOW TABLES outputs a isTemporary column and we can check that directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to parse that isTemporary which is the third string in the row. Neither row.getString(2).toInt or row.getString(2).toBoolean does not work. I dig into the codebase and seems essentially the RowSerializer uses Unsafe.putBoolean.

Do you know what is the right way to parse a serialized boolean? Should I use a RowDeserializer somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, for a temp view named my_temp_table, this is the internal row in-memory value: [0,2000000000,200000000d,1,5f706d65745f796d,656c626174]

toCatalystRow(table.namespace().quoted, table.name(), isTempView(table)). The isTempView is the third value, which is 200000000d.

Also I actually don't know why there are five values in the row....

if (row.getString(0).isEmpty) {
Seq(row.getString(1))
} else {
ident ++ Seq(row.getString(1))
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, is this a regression due to one of the recent commits like SPARK-39236?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-39236 updated listTables(dbName). This PR does not cause regression on that JIRA.

This is more like a side effect of SPARK-39506. Because in SPARK-39506 we support setCurrentCatalog and get currentCatalog, now for listTables it has a choice of which catalog to search for tables. In the past it always go to the only catalog which is spark_catalog, but now that catalog can be changed.

listDatabases() was already updated to respect the current catalog.

Copy link
Contributor Author

@amaliujia amaliujia Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is hard to define whether this is a regression (I would rather say it is a side effect that given we introduced a way to control current catalog). I think at least it still maintains backwards compatibility. For old users who do not need set current catalog, it will still be the one that they would target to (spark_catalog). The existing UT has tested that.

And then for new users, their set current catalog will be respected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detail. Yes, it's hard to say always during extending the existing semantics. New features are always nice to have, but what I hope is to keep the original features safe and independent as much as possible . As long as the old code works, we are good. Thank you again for all your efforts, @amaliujia .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding deprecations is also the best way until we are sure that the new features are manure enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on that there should be a period of time to have new features mature enough with good adoptions before talking about deprecations.

.map(makeTable)
CatalogImpl.makeDataset(tables, sparkSession)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,31 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
assert(spark.catalog.listTables().collect().map(_.name).toSet == Set("my_table2"))
}

test("SPARK-39828: Catalog.listTables() should respect currentCatalog") {
assert(spark.catalog.currentCatalog() == "spark_catalog")
assert(spark.catalog.listTables().collect().isEmpty)
createTable("my_table1")
assert(spark.catalog.listTables().collect().map(_.name).toSet == Set("my_table1"))

val catalogName = "testcat"
val dbName = "my_db"
val tableName = "my_table2"
val tableSchema = new StructType().add("i", "int")
val description = "this is a test managed table"
sql(s"CREATE NAMESPACE ${catalogName}.${dbName}")

spark.catalog.setCurrentCatalog("testcat")
spark.catalog.setCurrentDatabase("my_db")
assert(spark.catalog.listTables().collect().isEmpty)

createTable(tableName, dbName, catalogName, classOf[FakeV2Provider].getName, tableSchema,
Map.empty[String, String], description)
assert(spark.catalog.listTables()
.collect()
.map(t => Array(t.catalog, t.namespace.mkString("."), t.name).mkString(".")).toSet ==
Set("testcat.my_db.my_table2"))
}

test("list tables with database") {
assert(spark.catalog.listTables("default").collect().isEmpty)
createDatabase("my_db1")
Expand Down