-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-29558][SQL] ResolveTables and ResolveRelations should be order-insensitive #26214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -198,7 +198,6 @@ class Analyzer( | |
| ResolveTableValuedFunctions :: | ||
| new ResolveCatalogs(catalogManager) :: | ||
| ResolveInsertInto :: | ||
| ResolveTables :: | ||
| ResolveRelations :: | ||
| ResolveReferences :: | ||
| ResolveCreateNamedStruct :: | ||
|
|
@@ -666,12 +665,26 @@ class Analyzer( | |
| } | ||
|
|
||
| /** | ||
| * Resolve table relations with concrete relations from v2 catalog. | ||
| * Resolve relations to temp views. This is not an actual rule, and is only called by | ||
| * [[ResolveTables]]. | ||
| */ | ||
| object ResolveTempViews extends Rule[LogicalPlan] { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { | ||
| case u @ UnresolvedRelation(Seq(part1)) => | ||
| v1SessionCatalog.lookupTempView(part1).getOrElse(u) | ||
| case u @ UnresolvedRelation(Seq(part1, part2)) => | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This needs to check whether Not needing to remember that is the purpose of the extractors. I think it would be better to continue using the extractor: case u @ UnresolvedRelation(AsTemporaryTableIdentifier(ident)) =>
ident.database match {
case Some(db) =>
v1SessionCatalog.lookupGlobalTempView(db, ident.table).getOrElse(u)
case None =>
v1SessionCatalog.lookupTempView(ident.table).getOrElse(u)
}
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as discussed in the sync, we decide to treat the global temp view name prefix
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is intended to only match
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unfortunately,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that's why I went ahead with the merge. I think the code is currently correct. Still, it would be nice to use the runtime setting here in the matcher instead. |
||
| v1SessionCatalog.lookupGlobalTempView(part1, part2).getOrElse(u) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Resolve table relations with concrete relations from v2 catalog. This is not an actual rule, | ||
| * and is only called by [[ResolveRelations]]. | ||
| * | ||
| * [[ResolveRelations]] still resolves v1 tables. | ||
| */ | ||
| object ResolveTables extends Rule[LogicalPlan] { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { | ||
| def apply(plan: LogicalPlan): LogicalPlan = ResolveTempViews(plan).resolveOperatorsUp { | ||
| case u: UnresolvedRelation => | ||
| lookupV2Relation(u.multipartIdentifier) | ||
| .getOrElse(u) | ||
|
|
@@ -733,10 +746,6 @@ class Analyzer( | |
| // Note this is compatible with the views defined by older versions of Spark(before 2.2), which | ||
| // have empty defaultDatabase and all the relations in viewText have database part defined. | ||
| def resolveRelation(plan: LogicalPlan): LogicalPlan = plan match { | ||
| case u @ UnresolvedRelation(AsTemporaryViewIdentifier(ident)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't it be simpler to just call |
||
| if v1SessionCatalog.isTemporaryTable(ident) => | ||
| resolveRelation(lookupTableFromCatalog(ident, u, AnalysisContext.get.defaultDatabase)) | ||
|
|
||
| case u @ UnresolvedRelation(AsTableIdentifier(ident)) if !isRunningDirectlyOnFiles(ident) => | ||
| val defaultDatabase = AnalysisContext.get.defaultDatabase | ||
| val foundRelation = lookupTableFromCatalog(ident, u, defaultDatabase) | ||
|
|
@@ -767,7 +776,7 @@ class Analyzer( | |
| case _ => plan | ||
| } | ||
|
|
||
| def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { | ||
| def apply(plan: LogicalPlan): LogicalPlan = ResolveTables(plan).resolveOperatorsUp { | ||
| case i @ InsertIntoStatement(u @ UnresolvedRelation(AsTableIdentifier(ident)), _, child, _, _) | ||
| if child.resolved => | ||
| EliminateSubqueryAliases(lookupTableFromCatalog(ident, u)) match { | ||
|
|
@@ -2839,7 +2848,6 @@ class Analyzer( | |
| private def lookupV2RelationAndCatalog( | ||
| identifier: Seq[String]): Option[(DataSourceV2Relation, CatalogPlugin, Identifier)] = | ||
| identifier match { | ||
| case AsTemporaryViewIdentifier(ti) if v1SessionCatalog.isTemporaryTable(ti) => None | ||
| case CatalogObjectIdentifier(catalog, ident) if !CatalogV2Util.isSessionCatalog(catalog) => | ||
| CatalogV2Util.loadTable(catalog, ident) match { | ||
| case Some(table) => Some((DataSourceV2Relation.create(table), catalog, ident)) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -767,6 +767,25 @@ class SessionCatalog( | |
| } | ||
| } | ||
|
|
||
| def lookupTempView(table: String): Option[SubqueryAlias] = { | ||
| val formattedTable = formatTableName(table) | ||
| getTempView(formattedTable).map { view => | ||
| SubqueryAlias(formattedTable, view) | ||
| } | ||
| } | ||
|
|
||
| def lookupGlobalTempView(db: String, table: String): Option[SubqueryAlias] = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is safe and I do prefer these methods to a combined
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea here is to clearly separate the resolution of temp view and v1/v2 table, so I'd like to avoid using These 2 methods mostly copy-paste code from |
||
| val formattedDB = formatDatabaseName(db) | ||
| if (formattedDB == globalTempViewManager.database) { | ||
| val formattedTable = formatTableName(table) | ||
| getGlobalTempView(formattedTable).map { view => | ||
| SubqueryAlias(formattedTable, formattedDB, view) | ||
| } | ||
| } else { | ||
| None | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Return whether a table with the specified name is a temporary view. | ||
| * | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableExceptio | |
| import org.apache.spark.sql.connector.catalog._ | ||
| import org.apache.spark.sql.connector.catalog.CatalogManager.SESSION_CATALOG_NAME | ||
| import org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog | ||
| import org.apache.spark.sql.internal.SQLConf | ||
| import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf} | ||
| import org.apache.spark.sql.internal.SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION | ||
| import org.apache.spark.sql.sources.SimpleScanSource | ||
| import org.apache.spark.sql.types.{BooleanType, LongType, StringType, StructType} | ||
|
|
@@ -1786,6 +1786,20 @@ class DataSourceV2SQLSuite | |
| } | ||
| } | ||
|
|
||
| test("global temp view should not be masked by v2 catalog") { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is okay for now, but I think it is a little confusing that the catalog is completely ignored. I think this should result in an error instead, but we can do that in a follow-up. |
||
| val globalTempDB = spark.sessionState.conf.getConf(StaticSQLConf.GLOBAL_TEMP_DATABASE) | ||
| spark.conf.set(s"spark.sql.catalog.$globalTempDB", classOf[InMemoryTableCatalog].getName) | ||
|
|
||
| try { | ||
| sql("create global temp view v as select 1") | ||
| sql(s"alter view $globalTempDB.v rename to v2") | ||
| checkAnswer(spark.table(s"$globalTempDB.v2"), Row(1)) | ||
| sql(s"drop view $globalTempDB.v2") | ||
| } finally { | ||
| spark.sharedState.globalTempViewManager.clear() | ||
| } | ||
| } | ||
|
|
||
| private def testV1Command(sqlCommand: String, sqlParams: String): Unit = { | ||
| val e = intercept[AnalysisException] { | ||
| sql(s"$sqlCommand $sqlParams") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that this is refactored into a separate rule. Can we move it to an earlier batch? If metastore views can't contain temp views, then there's no reason to do this in the same batch as table and view resolution from catalogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed in the sync, we decide to keep it in the current batch for safety, in case some user-supplied analyzers rules need Spark to resolve unresolved temp views.