Skip to content

Fix failure when query table function contains column alias in JDBC connectors#16235

Merged
ebyhr merged 2 commits intotrinodb:masterfrom
ebyhr:ebi/mysql-query-column-alias
Mar 17, 2023
Merged

Fix failure when query table function contains column alias in JDBC connectors#16235
ebyhr merged 2 commits intotrinodb:masterfrom
ebyhr:ebi/mysql-query-column-alias

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Feb 23, 2023

Description

Fixes #16225

Release notes

(x) Release notes are required, with the following suggested text:

# MySQL, Druid, MariaDB, SingleStore
* Fix failure when `query` table function contains column alias. ({issue}`16225`)

@cla-bot cla-bot bot added the cla-signed label Feb 23, 2023
@ebyhr ebyhr force-pushed the ebi/mysql-query-column-alias branch from 55c3b4e to 881fb81 Compare February 23, 2023 22:54
@ebyhr ebyhr marked this pull request as ready for review February 24, 2023 12:44
@ebyhr ebyhr requested review from hashhar and wendigo February 24, 2023 22:59
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no guarantee that the outputs of the query will match the column names in the table function. If you want to be absolutely sure, you should do something like:

SELECT region_name FROM TABLE(...)

which should fail if the column is not exposed by the table function.

Copy link
Copy Markdown
Member Author

@ebyhr ebyhr Feb 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to the suggested query doesn't guarantee either and it hides the actual output column names. For instance, the below test pass in H2 because the generated query will be SELECT "REGION_NAME" FROM (SELECT name AS region_name FROM tpch.region WHERE regionkey = 0) o.

        assertThat(query(format("SELECT region_name FROM TABLE(system.query(query => 'SELECT name AS region_name FROM %s.region WHERE regionkey = 0'))", getSession().getSchema().orElseThrow())))
                .skippingTypesCheck()
                .hasOutputNames(ImmutableList.of("region_name"))
                .matches("VALUES 'AFRICA'");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. If the table function doesn’t expose the name, the outer query should fail with a column not found error.

Also, what do you mean by “passes on H2”? That query should not even run against H2, since it H2 doesn’t support table functions, or any of the table functions in Trino.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, given:

SELECT region_name FROM TABLE(system.query(query => 'SELECT name FROM tpch.region WHERE regionkey = 0'));

=>

Query 20230225_223332_00030_tgtxv failed: line 1:8: Column 'region_name' cannot be resolved

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have TestJdbcConnectorTest running on H2.

The given example ensures column names without case sensitivity. The below query (added a column alias to the table function) should pass in my understanding because SELECT name FROM (SELECT name as "NAME" FROM region) is a valid query in Trino. This is what happening on the above H2 example, the table function exposes the column names with uppercase, but adding the projection lowercase the final output column names. That's why I didn't add the projection.

SELECT name FROM TABLE(system.query(query => 'SELECT name AS "NAME" FROM tpch.region WHERE regionkey = 0'));
  name
--------
 AFRICA

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have TestJdbcConnectorTest running on H2.

Some tests do, but QueryAssert does not use H2. It runs the given query and compares the results to the output of the .matches() clause (which is another query that runs in Trino).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The given example ensures column names without case sensitivity.

Well, it tests that the name is resolvable using Trino's identifier resolution rules (which, unfortunately, currently don't follow proper SQL semantics).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests do, but QueryAssert does not use H2. It runs the given query and compares the results to the output of the .matches() clause (which is another query that runs in Trino).

TestJdbcConnectorTest uses createH2QueryRunner. The behind database is H2 regardless of assertion methods.

Copy link
Copy Markdown
Member Author

@ebyhr ebyhr Feb 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I replied in Slack, SELECT name FROM ... ensures the internal flow, but it doesn't ensure the final output. It would be nice to verify the output column names as well. Otherwise, we may miss the future change about case sensitivity.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint Gentle reminder.

@ebyhr ebyhr requested a review from martint February 25, 2023 01:05
@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Feb 25, 2023

@martint Please take a look at the above comments.

@ebyhr ebyhr force-pushed the ebi/mysql-query-column-alias branch 2 times, most recently from 2611a0c to 72bad4b Compare February 26, 2023 07:08
@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Feb 27, 2023

@martint Please take another look.

Copy link
Copy Markdown
Contributor

@wendigo wendigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @ebyhr

@ebyhr ebyhr force-pushed the ebi/mysql-query-column-alias branch from 72bad4b to 57b030e Compare February 28, 2023 10:00
@github-actions github-actions bot added the bigquery BigQuery connector label Feb 28, 2023
@ebyhr ebyhr force-pushed the ebi/mysql-query-column-alias branch from 57b030e to 4b601be Compare February 28, 2023 11:50
@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Mar 1, 2023

@hashhar @Praveen2112 Gentle reminder.

Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @Praveen2112 do you want to take a look?

I don't understand Martin's concerns since none of this happens within the engine, it's how the connector reports information back to the engine and this is actually a bugfix showcased by the tests.

@martint
Copy link
Copy Markdown
Member

martint commented Mar 13, 2023

What I said above is that we shouldn’t be verifying that the table function exposes the expected names by checking how they are propagated outside of the outer query. We can validate that by having the outer query try to resolve the names directly.

@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Mar 13, 2023

We can validate that by having the outer query try to resolve the names directly.

testNativeQueryColumnAlias covers this scenario.

@ebyhr ebyhr merged commit f8ee13d into trinodb:master Mar 17, 2023
@ebyhr ebyhr deleted the ebi/mysql-query-column-alias branch March 17, 2023 05:51
@github-actions github-actions bot added this to the 411 milestone Mar 17, 2023
@ebyhr ebyhr self-assigned this Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cla-signed

Development

Successfully merging this pull request may close these issues.

ptf fails when alias exists for result column

5 participants