Skip to content

Conversation

@imback82
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used.

For v1 tables, you can currently perform the following:

SELECT default.t.id FROM t;

For v2 tables, the following fails:

USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;

org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7;

Why are the changes needed?

It is a bug since qualified column names cannot match if current catalog/namespace are used.

Does this PR introduce any user-facing change?

Yes, now the following works:

USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;

How was this patch tested?

Added new tests

@imback82
Copy link
Contributor Author

cc @cloud-fan @brkyvz

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118191 has finished for PR 27532 at commit 10b19d9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118183 has finished for PR 27532 at commit 5706a3e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@imback82 can you try some other databases like hive, presto, sql server? SELECT default.t.id FROM t looks a little counterintuitive to me.

@imback82
Copy link
Contributor Author

I tried postgres and mysql, and they allow this syntax.

postgres:

postgres=# create schema s1;
CREATE SCHEMA

postgres=# create table s1.t (i int);
CREATE TABLE

postgres=# SET search_path TO s1;
SET

postgres=# select t.i from t;
 i 
---
(0 rows)

postgres=# select s1.t.i from t;
 i 
---
(0 rows)

mysql:

mysql> create database test
    -> ;
Query OK, 1 row affected (0.00 sec)

mysql> use test;
Database changed

mysql> create table t (i int);
Query OK, 0 rows affected (0.00 sec)

mysql> select t.i from t;
Empty set (0.00 sec)

mysql> select test.t.i from t;
Empty set (0.00 sec)

@imback82
Copy link
Contributor Author

imback82 commented Feb 12, 2020

In contrast, Hive doesn't like the syntax

%jdbc(hive)
select tbl.t8 from default.tbl;
# this works

%jdbc(hive)
select default.tbl.t8 from default.tbl;
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'default': (possible column names are: t8, t9)

I think we need to at lease make the behavior consistent b/w v1 and v2 tables.

// not belong to any namespaces. For v1 tables, namespace is resolved in
// `SessionCatalog.getRelation`.
val ns = if (ident.namespace.isEmpty) {
catalogManager.currentNamespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the namespace is really []?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing it in #27550

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will take a look at the PR. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still problematic even after #27550 because we allow spark_catalog.t. It will be resolved to an identifier with an empty namespace in CatalogAndIdentifier whereas v1SessionCatalog always uses current database if the given namespace (database) is empty. Should I just go ahead and disallow spark_catalog.t? What do you think? We briefly discussed this issue here: #27550 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea let's do it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan this is now fixed without this hack.

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118927 has finished for PR 27532 at commit 6660d5c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case table =>
SubqueryAlias(
identifier,
ident.asMultipartIdentifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add catalog name too? to support cases like select spark_catalog.default.t.i from t.

Copy link
Contributor Author

@imback82 imback82 Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but it would not be consistent with v1 table behavior. I was thinking about adding this support when I update the resolution rule for session catalogs: #27391 (comment). What do you think, should I do it now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. Let's leave it for now.

Seq(true, false).foreach { useV1Table =>
val format = if (useV1Table) "json" else v2Format
if (useV1Table) {
spark.conf.unset(V2_SESSION_CATALOG_IMPLEMENTATION.key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we keep this comment: // unset this config to use the default v2 session catalog.

checkAnswer(sql("select t.i from spark_catalog.default.t"), Row(1))
checkAnswer(sql("select default.t.i from spark_catalog.default.t"), Row(1))

// catalog name cannot be used for v1 tables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1 tables -> tables in the session catalog, as we are testing both v1 and v2 tables here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

@SparkQA
Copy link

SparkQA commented Feb 26, 2020

Test build #118947 has finished for PR 27532 at commit f23ded6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 26, 2020

Test build #118970 has finished for PR 27532 at commit f23ded6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in 7330547 Feb 26, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan
Copy link
Contributor

BTW, don't forget to remove the hack in TempViewOrV1Table :)

cloud-fan pushed a commit that referenced this pull request Feb 26, 2020
…namespace for v2 tables

### What changes were proposed in this pull request?

This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used.

For v1 tables, you can currently perform the following:
```SQL
SELECT default.t.id FROM t;
```

For v2 tables, the following fails:
```SQL
USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;

org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7;
```

### Why are the changes needed?

It is a bug since qualified column names cannot match if current catalog/namespace are used.

### Does this PR introduce any user-facing change?

Yes, now the following works:
```SQL
USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;
```

### How was this patch tested?

Added new tests

Closes #27532 from imback82/qualifed_col_respect_current.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 7330547)
Signed-off-by: Wenchen Fan <[email protected]>
@imback82
Copy link
Contributor Author

Yes, I am working on it next!

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…namespace for v2 tables

### What changes were proposed in this pull request?

This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used.

For v1 tables, you can currently perform the following:
```SQL
SELECT default.t.id FROM t;
```

For v2 tables, the following fails:
```SQL
USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;

org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7;
```

### Why are the changes needed?

It is a bug since qualified column names cannot match if current catalog/namespace are used.

### Does this PR introduce any user-facing change?

Yes, now the following works:
```SQL
USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;
```

### How was this patch tested?

Added new tests

Closes apache#27532 from imback82/qualifed_col_respect_current.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants