-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30782][SQL] Column resolution doesn't respect current catalog/namespace for v2 tables. #27532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #118191 has finished for PR 27532 at commit
|
|
Test build #118183 has finished for PR 27532 at commit
|
|
@imback82 can you try some other databases like hive, presto, sql server? |
|
I tried postgres and mysql, and they allow this syntax. postgres: mysql: |
|
In contrast, Hive doesn't like the syntax I think we need to at lease make the behavior consistent b/w v1 and v2 tables. |
| // not belong to any namespaces. For v1 tables, namespace is resolved in | ||
| // `SessionCatalog.getRelation`. | ||
| val ns = if (ident.namespace.isEmpty) { | ||
| catalogManager.currentNamespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the namespace is really []?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fixing it in #27550
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I will take a look at the PR. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still problematic even after #27550 because we allow spark_catalog.t. It will be resolved to an identifier with an empty namespace in CatalogAndIdentifier whereas v1SessionCatalog always uses current database if the given namespace (database) is empty. Should I just go ahead and disallow spark_catalog.t? What do you think? We briefly discussed this issue here: #27550 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea let's do it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan this is now fixed without this hack.
|
Test build #118927 has finished for PR 27532 at commit
|
| case table => | ||
| SubqueryAlias( | ||
| identifier, | ||
| ident.asMultipartIdentifier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add catalog name too? to support cases like select spark_catalog.default.t.i from t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but it would not be consistent with v1 table behavior. I was thinking about adding this support when I update the resolution rule for session catalogs: #27391 (comment). What do you think, should I do it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok. Let's leave it for now.
| Seq(true, false).foreach { useV1Table => | ||
| val format = if (useV1Table) "json" else v2Format | ||
| if (useV1Table) { | ||
| spark.conf.unset(V2_SESSION_CATALOG_IMPLEMENTATION.key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we keep this comment: // unset this config to use the default v2 session catalog.
| checkAnswer(sql("select t.i from spark_catalog.default.t"), Row(1)) | ||
| checkAnswer(sql("select default.t.i from spark_catalog.default.t"), Row(1)) | ||
|
|
||
| // catalog name cannot be used for v1 tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1 tables -> tables in the session catalog, as we are testing both v1 and v2 tables here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
|
Test build #118947 has finished for PR 27532 at commit
|
|
retest this please |
|
Test build #118970 has finished for PR 27532 at commit
|
|
thanks, merging to master/3.0! |
|
BTW, don't forget to remove the hack in |
…namespace for v2 tables ### What changes were proposed in this pull request? This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used. For v1 tables, you can currently perform the following: ```SQL SELECT default.t.id FROM t; ``` For v2 tables, the following fails: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7; ``` ### Why are the changes needed? It is a bug since qualified column names cannot match if current catalog/namespace are used. ### Does this PR introduce any user-facing change? Yes, now the following works: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; ``` ### How was this patch tested? Added new tests Closes #27532 from imback82/qualifed_col_respect_current. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 7330547) Signed-off-by: Wenchen Fan <[email protected]>
|
Yes, I am working on it next! |
…namespace for v2 tables ### What changes were proposed in this pull request? This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used. For v1 tables, you can currently perform the following: ```SQL SELECT default.t.id FROM t; ``` For v2 tables, the following fails: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7; ``` ### Why are the changes needed? It is a bug since qualified column names cannot match if current catalog/namespace are used. ### Does this PR introduce any user-facing change? Yes, now the following works: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; ``` ### How was this patch tested? Added new tests Closes apache#27532 from imback82/qualifed_col_respect_current. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used.
For v1 tables, you can currently perform the following:
For v2 tables, the following fails:
Why are the changes needed?
It is a bug since qualified column names cannot match if current catalog/namespace are used.
Does this PR introduce any user-facing change?
Yes, now the following works:
How was this patch tested?
Added new tests