Skip to content

Add table function exclude_columns#16584

Merged
kasiafi merged 3 commits intotrinodb:masterfrom
kasiafi:469SelectExcept
Apr 4, 2023
Merged

Add table function exclude_columns#16584
kasiafi merged 3 commits intotrinodb:masterfrom
kasiafi:469SelectExcept

Conversation

@kasiafi
Copy link
Copy Markdown
Member

@kasiafi kasiafi commented Mar 16, 2023

Description

Adds a new table function exclude_columns as a built-in table function.

SELECT *
FROM TABLE(exclude_columns(
                        input => TABLE(orders),
                        columns => DESCRIPTOR(clerk, comment)))

This function is a replacement for syntax SELECT * EXCEPT(...). Such syntax, in a few variants, is supported by different databases (as discussed here https://stackoverflow.com/questions/413819/select-except). However, it is not part of the SQL standard, and so Trino does not support it.

Additional context and related issues

There is a related optimization PR submitted for review: #16012. Technically, the optimization is not a prerequisite, but it could affect the performance a lot by enabling column pruning.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# General
* Add a built-in table function `exclude_columns()`.
# General
* Allow table functions to return anonymous columns.

Docs included.

@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 20, 2023

CI Pull Request Labeler / Test Report (pull_request_target) https://github.com/trinodb/trino/pull/16584/checks?check_run_id=12085703933 red:

TestKafkaPushdownSmokeTest > testCreateTimePushdown [groups: profile_specific_tests, kafka]
java.sql.SQLException: Query failed (#20230317_115011_00000_p5jy5): No worker nodes available

but pt (default, suite-kafka, ) is green and reports success for TestKafkaPushdownSmokeTest.testCreateTimePushdown method (3 times)

is Pull Request Labeler right or not?

cc @nineinchnick

@ebyhr ebyhr self-requested a review March 20, 2023 09:48
Comment thread testing/trino-tests/src/test/java/io/trino/tests/TestExcludeColumnsFunction.java Outdated
Comment thread testing/trino-tests/src/test/java/io/trino/tests/TestExcludeColumnsFunction.java Outdated
Comment thread testing/trino-tests/src/test/java/io/trino/tests/TestExcludeColumnsFunction.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/operator/table/ExcludeColumns.java Outdated
@kasiafi kasiafi force-pushed the 469SelectExcept branch 2 times, most recently from 942545e to 51b8293 Compare March 24, 2023 14:48
@kasiafi
Copy link
Copy Markdown
Member Author

kasiafi commented Mar 24, 2023

@ebyhr I added the ability to return anonymous columns from a table function (as a prep commit).
Now the function exclude_columns returns columns with the same name, or no name, as the original column.
For the example mentioned above, the result is:

trino:tiny> SELECT 0, 1, *
         -> FROM TABLE(exclude_columns(
         ->                    TABLE(SELECT * FROM (SELECT 3 a), (SELECT 4)),
         ->                    DESCRIPTOR(a)));
 _col0 | _col1 | _col2
-------+-------+-------
     0 |     1 |     4
(1 row)

@kasiafi
Copy link
Copy Markdown
Member Author

kasiafi commented Mar 24, 2023

@martint I enabled returning anonymous columns from a table function. I believe we discussed it offline some time ago. Please take a look.

Comment thread testing/trino-tests/src/test/java/io/trino/tests/TestExcludeColumnsFunction.java Outdated
@kasiafi kasiafi force-pushed the 469SelectExcept branch 2 times, most recently from 12f9fc0 to a69e48e Compare March 28, 2023 17:05
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to be explicit and say "COLUMNS descriptor is null".

BTW, under what conditions would this happen? The argument is marked as required, so the query should fail during analysis if it's missing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the argument is required, but the user can pass null descriptor: columns => cast(null AS descriptor). There is a test case that covers it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we'll have to fix when we address identifier semantics. Let's add a TODO.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since descriptor columns names are canonical, maybe we keep them as is here and canonicalize the TableArgument columns instead when matching (for now, canonicalize to lower-case to preserve current semantics)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this.
If we uppercase one side of comparison (which is common in SQL-standard canonicalization), and lowercase the other side, we have little chance for a match. Effectively, the user would have to write all names in COLUMNS in lowercase and quoted.

Comment thread core/trino-spi/src/main/java/io/trino/spi/ptf/Descriptor.java Outdated
@kasiafi
Copy link
Copy Markdown
Member Author

kasiafi commented Mar 29, 2023

@martint Updated. Please re-review.

kasiafi added 3 commits March 30, 2023 09:34
Before this change, if a table function did not pass
a ConnectorTableFunctionHandle in the TableFunctionAnalysis,
the default handle was used, which was an anonymous
implementation of ConnectorTableFunctionHandle.

It did not work with table functions executed by operator,
due to lack of serialization.

This change introduces EmptyTableFunctionHandle, and sets
it as default.
Per SQL standard, all columns must be named. In Trino,
we support unnamed columns.
This change adjusts table functions so that they can return
anonymous columns.
It is achieved by changing the Descriptor structure so that
field name is optional. This optionality can only be used
for the returned type of table functions. Descriptor arguments
pased by the user as well as default values for descriptor
argumens have mandatory field names.
@kasiafi kasiafi requested review from ebyhr and martint March 31, 2023 07:00
@kasiafi kasiafi merged commit bffa6c6 into trinodb:master Apr 4, 2023
@github-actions github-actions Bot added this to the 412 milestone Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants