Skip to content

Conversation

@kasiafi
Copy link
Member

@kasiafi kasiafi commented Oct 11, 2022

Description

This PR adds an iterative rule ImplementTableFunctionSource which transforms (potentially) multiple sources of a TableFunctionNode into a single source. The rule joins the sources according to their partitioning, ordering, co-partitioning and prune when empty properties.
This PR also introduces a new PlanNode to represent the table function invocation with pre-processed source: TableFunctionProcessingNode.

note: the rule is quite complicated, so the implementation was split into three steps, each covering some area. I meant to squash them after the review, but it would be also correct to merge them separately.

Release notes

(x ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Oct 11, 2022
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch 7 times, most recently from b1a3258 to f66cf93 Compare October 14, 2022 09:39
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch 2 times, most recently from da3abf4 to 2bcc444 Compare October 20, 2022 13:29
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch from 2bcc444 to e7532c4 Compare October 27, 2022 09:11
@kasiafi kasiafi changed the title WIP Implement table function source Implement table function source Oct 27, 2022
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch 2 times, most recently from 7c8b0f2 to 8163379 Compare November 29, 2022 18:51
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch from 8163379 to 3a1c7f0 Compare December 19, 2022 12:30
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch 5 times, most recently from 3c567ec to 5ab991d Compare December 25, 2022 15:35
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch from 5ab991d to a47c5c9 Compare January 2, 2023 14:53
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch from a47c5c9 to f3792b4 Compare January 3, 2023 08:32
@kasiafi kasiafi requested a review from martint January 9, 2023 15:15
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch 4 times, most recently from 2f9d775 to 0a1cd98 Compare January 17, 2023 13:58
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch from 0a1cd98 to c0fead5 Compare January 20, 2023 10:07
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch 3 times, most recently from c2b35d8 to c248039 Compare January 25, 2023 13:18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why special case for when there are two or more sources? Can we just treat everything uniformly even if there's a single source?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we could let a single source through the whole algorithm. However, the marker symbols, and all the other "helper" symbols only make sense when you need to make a cross-product of partitions from 2+ sources. The algorithm appends many stages of processing, including window operations (which wouldn't get pruned afterwards for a single source, if we really treated it uniformly).

Another argument for special handling the case of a single source is for code readability in what is probably going to be the most common case.

@kasiafi kasiafi force-pushed the 394ImplementTFSource branch 3 times, most recently from f204add to 79b04f7 Compare January 27, 2023 21:02
@kasiafi kasiafi requested a review from martint January 27, 2023 21:16
@kasiafi
Copy link
Member Author

kasiafi commented Jan 27, 2023

@martint I applied your comments, rebased, and squashed the rule into one commit. PTAL

Change the builder for TableArgumentSpecification so that
there is no default for the empty behavior.
…tput

If a table function has multiple partitioned inputs, they might be
co-partitioned. Co-partitioning might require coercing of the
corresponding partitioning columns to common supertype.

Additionally, each partitioning source column should be produced
on table function's output in its original (uncoerced) form.

Before this change, the coerced columns were incorrectly passed to
output. After this change, the uncoerced columns are preserved for
output independently from the Specification, which is planned with
regard to the required co-partitioning coercions.
Adds the SchemaFunctionName, which can act as a unique
identifier of the function for a given catalog.
the necessary pass-through information for a table function's source includes:
- whether the source was declared as pass-through
- an ordered list of pass-through columns
- for each column, information whether it is a partitioning column
When the ordering scheme is mapped, some symbols might
be removed due to de-duplication. If de-duplication happens
within the pre-sorted prefix, the length of the prefix
should be updated.

This causes no issues currently, since UnaliasSymbolReferences,
and so the SymbolMapper, is never called after AddLocalExchanges,
where the pre-sorted symbols are determined.
In case when there are no input tables for a table
function invocation, the resulting
TableFunctionProcessingNode has no soures.
Otherwise, it has one source being a combination of
all inputs.
Support arbitrary number of sources (including no sources),
involving row and set semantics, prune/keep when empty properties,
and co-partitioning.
@kasiafi kasiafi force-pushed the 394ImplementTFSource branch from 79b04f7 to ed5cc54 Compare January 30, 2023 08:37
@kasiafi kasiafi merged commit 5f0f031 into trinodb:master Jan 30, 2023
@github-actions github-actions bot added this to the 407 milestone Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants